LlamaIndex RAG Debugging

Debug LlamaIndex RAG pipelines with callbacks, query engine logging, and retrieval analysis.

LlamaIndex provides powerful abstractions for RAG, but debugging requires understanding how those abstractions interact. Here's how to get full visibility into your LlamaIndex pipeline.

๐Ÿ”

Diagnose Your RAG Failure Automatically

Paste your RAG trace or describe the problem. Get instant failure mode classification and copy-paste code fixes.

Try RAG Failure Debugger โ€” Free

3 free analyses/month ยท Pro unlimited at $9/mo

Enable Full Tracing

CallbackManager setup

Import CallbackManager and LlamaDebugHandler. Add the handler to Settings.callback_manager. Every node retrieval, LLM call, and token count is now tracked in handler.get_event_pairs().

Arize Phoenix integration

For production observability, integrate Arize Phoenix: set_global_handler('arize_phoenix'). You get a full UI showing retrieval scores, LLM traces, and latency breakdown per query.

Query Engine Debugging

Inspect retrieved nodes

Call query_engine.retriever.retrieve(query) directly to see raw retrieval before synthesis. Print node.score and node.text[:200] for each node to spot bad chunks.

Response synthesizer isolation

Build a ResponseSynthesizer separately and call synthesize(query, nodes=your_nodes) with hand-picked nodes. Isolates whether synthesis or retrieval is the problem.

Node postprocessors

Add SimilarityPostprocessor(similarity_cutoff=0.7) to filter low-score nodes before synthesis. Add KeywordNodePostprocessor to require query keywords in retrieved nodes.

Common LlamaIndex Issues

Index staleness

If you add new documents without rebuilding the index, they won't be retrievable. Use index.refresh_ref_docs(documents) for incremental updates. Log document count before and after.

Chunk size vs embedding model mismatch

OpenAI's text-embedding-3-small has a 8191 token limit per chunk. If your chunks are larger, they're silently truncated. Set chunk_size=512 as a safe default.

Sub-question query engine failures

The sub-question engine generates sub-queries using the LLM. If the LLM generates malformed sub-queries, the whole pipeline fails silently. Log sub-questions before routing them.

Automate Your RAG Diagnosis

Manually working through this checklist for every RAG failure is time-consuming. The RAG Failure Debugger automates the classification step โ€” paste your trace or describe the problem, and get an instant failure mode diagnosis with copy-paste code fixes.

๐Ÿ”

Diagnose Your RAG Failure Automatically

Paste your RAG trace or describe the problem. Get instant failure mode classification and copy-paste code fixes.

Try RAG Failure Debugger โ€” Free

3 free analyses/month ยท Pro unlimited at $9/mo

Recommended Hosting for AI/ML Projects

  • DigitalOcean โ€” $200 free credit. GPU droplets for LLM inference, managed vector DBs coming soon.
  • Hostinger โ€” From $2.99/mo. Fast VPS for RAG API servers.
  • Namecheap โ€” Budget hosting + free domain for your AI projects.