LlamaIndex has published legal-kb, a public reference application on GitHub. It is described as a knowledge base for legal documents, powered by LlamaIndex Index v2 (the LlamaParse Platform). The project demonstrates a pattern the team calls a Retrieval Harness for agentic retrieval.
The approach differs from single-shot retrieval. Instead of one embedding search per query, an agent is given filesystem-style tools. It can then crawl a large, evolving knowledge base to solve a task. The tools mirror operations engineers already know: semantic and keyword search, regex grep, file search, and read.
What is legal-kb?
legal-kb is a working TanStack Start web app, not a library. You sign in, create a project, upload files, and chat with an agent. Each project is mirrored as a managed LlamaCloud Index v2. Uploaded files are parsed and indexed automatically in the background. The chat agent then queries that index live during each turn.
The Retrieval Harness, in plain terms
The harness provides a persistent data pipeline over your documents. It connects to a data source, indexes it, and keeps it updated. On top of that pipeline, it exposes a set of tools to the agent.
Those tools are deliberately close to filesystem operations. An agent can list files, read a file, grep inside a file, or run hybrid search. Because the tools are generic, you can plug the harness into your own agents.
The four agent tools
The agent in src/lib/agent.ts is given four tools. Each maps to an Index v2 retrieval API. The table below lists them as implemented.
| Tool | Backing API | Key parameters | What it does |
|---|---|---|---|
retrieve |
beta.retrieval.retrieve |
query, top_k, score_threshold, rerank_top_n, file_name, file_version |
Runs hybrid semantic search; optional reranking; returns chunks plus citations |
findFiles |
beta.retrieval.find |
file_name, file_name_contains |
Searches files by exact name or substring; paginates automatically |
readFile |
beta.retrieval.read |
file_id, offset, max_length |
Reads raw file content, with offset and length windows |
grepFile |
beta.retrieval.grep |
file_id, pattern, context_chars, limit |
Matches a pattern in one file; returns character positions |
The system prompt enforces an order. The agent must call findFiles first to establish the document inventory. It then narrows with retrieve, and confirms exact wording with readFile or grepFile before citing.
How it works under the hood
Uploads follow a clear pipeline in src/lib/files.ts. Bytes are pushed to the project’s LlamaCloud source directory. A File and ProjectFile row are written to PostgreSQL via Prisma. An index sync is triggered but not awaited; the UI polls status until ready.
Versioning is scoped to the (project, filename) pair. Re-uploading nda.pdf to the same project produces v1, v2, v3 side by side. The retrieval layer filters on the version metadata field. This gives version control over the knowledge base itself.
The agent uses the ToolLoopAgent from Vercel AI SDK 6. You pick OpenAI or Anthropic per turn and bring your own keys. Reasoning is streamed: Claude models use extended thinking; OpenAI reasoning models use a medium reasoning effort.
Here is a condensed but faithful view of the retrieve tool and the agent.
import { LlamaCloud } from '@llamaindex/llama-cloud'
import { tool, ToolLoopAgent } from 'ai'
import { z } from 'zod'
import { makeCitationId } from './citations'
// One tool closure per index. Wraps Index v2 retrieval APIs.
function createLlamaParseTools(apiKey: string, projectId: string, indexId: string) {
const client = new LlamaCloud({ apiKey })
const retrieve = tool({
description: 'Run a semantic retrieval query against an index.',
inputSchema: z.object({
query: z.string(),
top_k: z.number().nullable(),
score_threshold: z.number().nullable(),
rerank_top_n: z.number().nullable(), // set to enable reranking
file_name: z.string().nullable(), // metadata filter
file_version: z.number().nullable(),
}),
execute: async ({ query, top_k, score_threshold, rerank_top_n, file_name }) => {
const custom_filters = file_name
? { file_name: { operator: 'eq' as const, value: file_name } }
: undefined
const response = await client.beta.retrieval.retrieve({
index_id: indexId,
project_id: projectId,
query,
top_k,
score_threshold,
rerank: rerank_top_n != null ? { enabled: true, top_n: rerank_top_n } : undefined,
custom_filters,
})
// Return a model-readable list plus citations that drive the UI chips.
const citations = response.results.map((r) => ({
id: makeCitationId(), // e.g. "c7f2qa"
fileName: r.metadata?.file_name,
score: r.rerank_score ?? r.score ?? null,
preview: r.content.slice(0, 500),
}))
const formatted = response.results
.map((r, i) => `### Result #${i + 1}nn${r.content.slice(0, 600)}`)
.join('nn---nn')
return { formatted, citations }
},
})
// findFiles / readFile / grepFile follow the same shape, backed by
// client.beta.retrieval.find / .read / .grep
return { retrieve /* , findFiles, readFile, grepFile */ }
}
export function buildAgent(model, apiKey: string, projectId: string, indexId: string) {
return new ToolLoopAgent({
model,
tools: createLlamaParseTools(apiKey, projectId, indexId),
instructions:
'Always call findFiles first, ground every answer in the documents, ' +
'and cite ids inline as `cite:<id>`.',
})
}
Answers carry visual citations. Each retrieved chunk gets a short id, such as cite:c7f2qa. The agent references that id inline, and the UI renders a clickable citation chip. Clicking it opens the source page screenshot with bounding-box rectangles over the cited text.
Naive RAG vs the agentic Retrieval Harness
The harness is a different execution model from single-shot RAG. The comparison below focuses on behavior.
| Dimension | Naive / single-shot RAG | Agentic Retrieval Harness (Index v2) |
|---|---|---|
| Retrieval flow | One vector search per query | Multi-step tool loop: find → retrieve → read/grep |
| Search modes | Vector similarity only | Hybrid semantic search, keyword, and regex grep |
| Context | Fixed top-k chunks | Agent reads full files or windows on demand |
| Freshness | Static index | Persistent pipeline with sync and versioning |
| Precision control | Mostly hidden | top_k, score_threshold, rerank_top_n exposed |
| Citations | Chunk ids | Visual citations with page screenshots and bboxes |
| Best fit | Short question answering | Long-horizon document tasks |
Use cases, with examples
The design targets domains where agents navigate large document sets. Legal and fintech are the stated examples.
- Consider a contract question: ‘What notice is needed to terminate the MSA?’ The agent lists files, runs
retrieve, then greps the exact clause. It answers with a citation to the specific page. - Consider due diligence across a data room: An agent can
findFilesby name, thenreadFileeach candidate. It cross-checks clauses without a human opening every PDF. - Consider a versioned policy base: Because
retrieveaccepts afile_versionfilter, an agent can query a specific version. This supports change tracking over time.
Reference implementation
<div><div class=”fn”>Mutual_NDA.pdf<span class=”ver”>v2</span></div><div class=”meta”>parsed · indexed · ready</div></div>
</div>
<div class=”file” data-fn=”MSA_Acme_Vendor.pdf”>
<div class=”doc”>
</div><div><div class=”fn”>MSA_Acme_Vendor.pdf<span class=”ver”>v1</span></div><div class=”meta”>parsed · indexed · ready</div></div>
</div>
<div class=”file” data-fn=”Employment_Agreement.pdf”>
<div class=”doc”>
</div><div><div class=”fn”>Employment_Agreement.pdf<span class=”ver”>v1</span></div><div class=”meta”>parsed · indexed · ready</div></div>
</div>
<div class=”tools”>
<h2>Agent Tools</h2>
<div class=”tool”><code>retrieve</code> hybrid semantic search</div>
<div class=”tool”><code>findFiles</code> file search by name</div>
<div class=”tool”><code>readFile</code> read raw content</div>
<div class=”tool”><code>grepFile</code> regex pattern match</div>
</div>
</div>
<div class=”main”>
<div class=”chips” id=”chips”></div>
<div class=”feed” id=”feed”>
<div class=”empty” id=”empty”>Pick a question above, or type your own. The agent always calls <code>findFiles</code> first to establish the document inventory, then narrows with <code>retrieve</code>, then confirms exact wording with <code>readFile</code> or <code>grepFile</code> before citing.</div>
</div>
<div class=”composer”>
<input id=”q” placeholder=”Ask about termination, non-compete, payment terms…” autocomplete=”off”>
<button id=”go”>Run</button>
</div>
</div>
</div>
<div class=”ftr”>
<div>Interactive demo · <b>Marktechpost</b> — modeled on run-llama/legal-kb (Index v2 / LlamaParse Platform)</div>
<div><a href=”https://github.com/run-llama/legal-kb” target=”_blank” rel=”noopener”>github.com/run-llama/legal-kb
</a></div>
</div>
<div class=”modal” id=”modal”>
<div class=”card”>
<div class=”ch”><div class=”t” id=”mt”>Citation</div><div class=”x” id=”mx”>✕</div></div>
<div class=”shot” id=”shot”></div>
<div class=”pv”><div class=”lab”>Retrieved chunk</div><div id=”mpv”></div></div>
</div>
</div>
</div>
<script>
(function(){
var root=document.getElementById(‘mtp-harness’);
// — Knowledge base (illustrative content) —
var INTENTS=[{
key:’termination’,kw:[‘terminat’,’cancel’,’convenience’,’notice’,’end the contract’,’exit’],
file:’MSA_Acme_Vendor.pdf’,ver:1,page:6,score:0.912,
query:’termination rights and required notice period’,
chunk:’Either party may terminate this Master Services Agreement for convenience upon thirty (30) days prior written notice. Termination for cause is effective immediately upon written notice of a material breach that remains uncured after fifteen (15) days.’,
grep:’terminate this Master Services Agreement for convenience upon thirty (30) days’,
answer:’The MSA allows either party to terminate for convenience with thirty (30) days prior written notice §CITE§. Termination for cause is immediate after an uncured material breach of fifteen (15) days §CITE2§.’,
bbox:{x:14,y:38,w:78,h:20}
},{
key:’confidential’,kw:[‘confidential’,’nda’,’term’,’how long’,’duration’,’secret’,’disclos’],
file:’Mutual_NDA.pdf’,ver:2,page:2,score:0.934,
query:’confidentiality obligations and survival term’,
chunk:’The confidentiality obligations set forth herein shall survive for a period of three (3) years following the termination or expiration of this Agreement. Confidential Information excludes information that becomes publicly available through no fault of the receiving party.’,
grep:’shall survive for a period of three (3) years following the termination’,
answer:’Confidentiality obligations survive three (3) years after termination or expiration of the NDA §CITE§. Publicly available information is excluded from Confidential Information §CITE2§.’,
bbox:{x:12,y:30,w:80,h:22}
},{
key:’payment’,kw:[‘payment’,’net’,’invoice’,’pay’,’fees’,’billing’,’net 45′],
file:’MSA_Acme_Vendor.pdf’,ver:1,page:4,score:0.897,
query:’payment terms and invoicing schedule’,
chunk:’Vendor shall invoice monthly in arrears. Undisputed invoices are payable within forty-five (45) days of receipt (Net 45). Late payments accrue interest at 1.0% per month or the maximum rate permitted by law, whichever is lower.’,
grep:’payable within forty-five (45) days of receipt (Net 45)’,
answer:’Undisputed invoices are payable Net 45, within forty-five (45) days of receipt §CITE§. Late payments accrue 1.0% monthly interest or the legal maximum §CITE2§.’,
bbox:{x:14,y:34,w:76,h:18}
},{
key:’noncompete’,kw:[‘non-compete’,’noncompete’,’compete’,’restrict’,’employ’,’12 month’],
file:’Employment_Agreement.pdf’,ver:1,page:5,score:0.921,
query:’non-compete restriction scope and duration’,
chunk:’For twelve (12) months following the termination of employment, the Employee shall not engage in any business that directly competes with the Company within the territories where the Company actively operates.’,
grep:’For twelve (12) months following the termination of employment’,
answer:’The non-compete restricts the Employee for twelve (12) months after termination of employment §CITE§. It is limited to territories where the Company actively operates §CITE2§.’,
bbox:{x:13,y:40,w:78,h:16}
},{
key:’liability’,kw:[‘liabilit’,’cap’,’damages’,’indemnif’,’limitation’],
file:’MSA_Acme_Vendor.pdf’,ver:1,page:8,score:0.905,
query:’limitation of liability and damages cap’,
chunk:”Each party’s aggregate liability under this Agreement shall not exceed the total fees paid or payable in the twelve (12) months preceding the claim. Neither party is liable for indirect, incidental, or consequential damages.”,
grep:’aggregate liability under this Agreement shall not exceed the total fees’,
answer:’Aggregate liability is capped at the fees paid or payable in the prior twelve (12) months §CITE§. Indirect, incidental, and consequential damages are excluded §CITE2§.’,
bbox:{x:12,y:36,w:80,h:20}
},{
key:’governing’,kw:[‘governing’,’law’,’jurisdiction’,’delaware’,’venue’,’court’],
file:’Mutual_NDA.pdf’,ver:2,page:3,score:0.888,
query:’governing law and jurisdiction clause’,
chunk:’This Agreement shall be governed by and construed in accordance with the laws of the State of Delaware, without regard to its conflict-of-laws principles. The parties consent to exclusive jurisdiction in the state and federal courts located in Delaware.’,
grep:’governed by and construed in accordance with the laws of the State of Delaware’,
answer:’The NDA is governed by the laws of the State of Delaware §CITE§. The parties consent to exclusive jurisdiction in Delaware state and federal courts §CITE2§.’,
bbox:{x:13,y:32,w:78,h:18}
}];
var CHIPS=[
[‘How much notice is needed to terminate the MSA?’,’termination’],
[‘How long do confidentiality obligations last?’,’confidential’],
[‘What are the payment terms?’,’payment’],
[‘What is the non-compete duration?’,’noncompete’],
[‘What is the liability cap?’,’liability’],
[‘Which law governs the NDA?’,’governing’]
];
var feed=root.querySelector(‘#feed’), empty=root.querySelector(‘#empty’);
var input=root.querySelector(‘#q’), go=root.querySelector(‘#go’);
var chipWrap=root.querySelector(‘#chips’);
var busy=false;
CHIPS.forEach(function(c){
var b=document.createElement(‘div’);b.className=’chip’;b.textContent=c[0];
b.onclick=function(){ if(!busy){ input.value=c[0]; run(c[1]); } };
chipWrap.appendChild(b);
});
function rid(){var s=’abcdef0123456789′,o=’c’;for(var i=0;i<5;i++)o+=s[Math.floor(Math.random()*s.length)];return o;}
function esc(t){return t.replace(/&/g,’&’).replace(/</g,’<’).replace(/>/g,’>’);}
function match(text){
var t=text.toLowerCase(),best=null,hit=0;
INTENTS.forEach(function(it){
var c=0; it.kw.forEach(function(k){ if(t.indexOf(k)>-1)c++; });
if(c>hit){hit=c;best=it;}
});
return best;
}
function litFile(fn){
root.querySelectorAll(‘.file’).forEach(function(f){
f.classList.toggle(‘lit’, f.getAttribute(‘data-fn’)===fn);
});
}
function addStep(cls,label,html,delay){
return new Promise(function(res){
setTimeout(function(){
var s=document.createElement(‘div’);s.className=’step’;
s.innerHTML='<div class=”bubble ‘+cls+'”>’+label+'</div>’+html;
feed.appendChild(s); ping(); res();
},delay);
});
}
var C1,C2;
function run(forceKey){
if(busy)return; busy=true; go.disabled=true;
if(empty)empty.style.display=’none’;
feed.innerHTML=”;
var it = forceKey ? INTENTS.filter(function(x){return x.key===forceKey;})[0] : match(input.value||”);
C1=rid(); C2=rid();
if(!it){
addStep(‘find’,’findFiles’,callHTML(‘findFiles’,{},’3 files: Mutual_NDA.pdf (v2), MSA_Acme_Vendor.pdf (v1), Employment_Agreement.pdf (v1)’),150)
.then(function(){ return addStep(‘ans’,’answer’,'<div class=”answer”>The indexed documents do not contain enough information to answer that. Try termination, confidentiality, payment terms, non-compete, liability, or governing law.</div>’,700); })
.then(done); return;
}
litFile(it.file);
// 1) findFiles (always first)
addStep(‘find’,’findFiles’,callHTML(‘findFiles’,{},’3 files listed · ‘+it.file+’ (v’+it.ver+’) is a candidate’),150)
// 2) retrieve (hybrid search)
.then(function(){ return addStep(”,’retrieve’,callHTML(‘retrieve’,{query:it.query,top_k:5,rerank_top_n:3},null),820); })
.then(function(){ return addStep(”,’results’,retrieveResults(it),780); })
// 3) grep to confirm exact wording
.then(function(){ return addStep(‘grep’,’grepFile’,callHTML(‘grepFile’,{file:it.file,pattern:it.grep.slice(0,32)+’…’},’1 match confirmed on p.’+it.page),820); })
// 4) grounded answer with citations
.then(function(){ return addStep(‘ans’,’answer’,'<div class=”answer”>’+answerHTML(it)+'</div>’,780); })
.then(done);
}
function done(){ busy=false; go.disabled=false; }
function callHTML(name,args,note){
var a=Object.keys(args).map(function(k){
var v=args[k];
var val = typeof v===’number’ ? ‘<span class=”n”>’+v+'</span>’ : ‘<span class=”s”>”‘+esc(String(v))+'”</span>’;
return ‘<span class=”k”>’+k+'</span>: ‘+val;
}).join(‘<span class=”dim”>, </span>’);
var line='<div class=”call”><span class=”dim”>→ tool</span> ‘+name+'(<span class=”dim”>{ </span>’+a+'<span class=”dim”> }</span>)’;
if(note) line+='<br><span class=”dim”>✓ ‘+esc(note)+'</span>’;
line+='</div>’;
return line;
}
function retrieveResults(it){
var s2=(it.score-0.14).toFixed(3);
var h='<div class=”result”>’+
‘<div class=”rrow”><div class=”top”><span>Result #1 · ‘+it.file+’ · p.’+it.page+'</span><span class=”score”>score ‘+it.score.toFixed(3)+’ · <span class=”cid”>cite:’+C1+'</span></span></div>’+esc(it.chunk.slice(0,150))+’…</div>’+
‘<div class=”rrow”><div class=”top”><span>Result #2 · ‘+it.file+’ · p.’+it.page+'</span><span class=”score”>score ‘+s2+’ · <span class=”cid”>cite:’+C2+'</span></span></div>’+esc(it.chunk.slice(120,250))+’…</div>’+
‘</div>’;
return h;
}
function answerHTML(it){
var html=esc(it.answer)
.replace(‘§CITE§’,'<span class=”citechip” data-c=”1″>cite:’+C1+'</span>’)
.replace(‘§CITE2§’,'<span class=”citechip” data-c=”2″>cite:’+C2+'</span>’);
// stash for modal
root._cur=it;
return html;
}
// citation modal
var modal=root.querySelector(‘#modal’), shot=root.querySelector(‘#shot’),
mpv=root.querySelector(‘#mpv’), mt=root.querySelector(‘#mt’);
feed.addEventListener(‘click’,function(e){
var chip=e.target.closest(‘.citechip’); if(!chip)return;
var it=root._cur; if(!it)return;
mt.textContent=it.file+’ · page ‘+it.page+’ · v’+it.ver;
shot.innerHTML='<div style=”opacity:.55″>’+esc(it.chunk)+'</div>’+
‘<div class=”bbox” style=”left:’+it.bbox.x+’%;top:’+it.bbox.y+’%;width:’+it.bbox.w+’%;height:’+it.bbox.h+’%”></div>’;
mpv.textContent=it.chunk;
modal.classList.add(‘on’); ping();
});
root.querySelector(‘#mx’).onclick=function(){modal.classList.remove(‘on’);ping();};
modal.onclick=function(e){ if(e.target===modal){modal.classList.remove(‘on’);ping();} };
go.onclick=function(){ run(null); };
input.addEventListener(‘keydown’,function(e){ if(e.key===’Enter’)run(null); });
// auto-resize for WordPress embed
function ping(){
try{
var h=document.getElementById(‘mtp-harness’).offsetHeight+40;
parent.postMessage({type:’mtp-harness-height’,height:h},’*’);
}catch(e){}
}
window.addEventListener(‘load’,ping);
window.addEventListener(‘resize’,ping);
setTimeout(ping,300);
})();
</script>
</body>
</html>
”
style=”width:100%;height:600px;border:0;overflow:hidden;”
scrolling=”no”
loading=”lazy”
title=”Agentic Retrieval Harness — Interactive Demo”>
Key Takeaways
legal-kbis a public reference app showing agentic retrieval on LlamaIndex Index v2.- The agent gets four filesystem-style tools:
retrieve(hybrid search),findFiles,readFile, andgrepFile. - A persistent pipeline handles parsing, indexing, sync, and per-file version control.
- Answers include visual citations: page screenshots with bounding boxes over the cited text.
- The stack is TanStack Start, AI SDK 6, Prisma, and WorkOS, with per-user encrypted keys.
Check out the GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us
The post LlamaIndex ‘legal-kb’: Agentic Retrieval over Index v2 with retrieve, find, read, and grep Tools appeared first on MarkTechPost.
