LlamaIndex RAG Debugging

Debug LlamaIndex RAG pipelines with callbacks, query engine logging, and retrieval analysis.

LlamaIndex provides powerful abstractions for RAG, but debugging requires understanding how those abstractions interact. Here's how to get full visibility into your LlamaIndex pipeline.

๐Ÿ”

Diagnose Your RAG Failure Automatically

Paste your RAG trace or describe the problem. Get instant failure mode classification and copy-paste code fixes.

Try RAG Failure Debugger โ€” Free

3 free analyses/month ยท Pro unlimited at $9

Enable Full Tracing

CallbackManager setup

Import CallbackManager and LlamaDebugHandler. Add the handler to Settings.callback_manager. Every node retrieval, LLM call, and token count is now tracked in handler.get_event_pairs().

Arize Phoenix integration

For production observability, integrate Arize Phoenix: set_global_handler('arize_phoenix'). You get a full UI showing retrieval scores, LLM traces, and latency breakdown per query.

Query Engine Debugging

Inspect retrieved nodes

Call query_engine.retriever.retrieve(query) directly to see raw retrieval before synthesis. Print node.score and node.text[:200] for each node to spot bad chunks.

Response synthesizer isolation

Build a ResponseSynthesizer separately and call synthesize(query, nodes=your_nodes) with hand-picked nodes. Isolates whether synthesis or retrieval is the problem.

Node postprocessors

Add SimilarityPostprocessor(similarity_cutoff=0.7) to filter low-score nodes before synthesis. Add KeywordNodePostprocessor to require query keywords in retrieved nodes.

Common LlamaIndex Issues

Index staleness

If you add new documents without rebuilding the index, they won't be retrievable. Use index.refresh_ref_docs(documents) for incremental updates. Log document count before and after.

Chunk size vs embedding model mismatch

OpenAI's text-embedding-3-small has a 8191 token limit per chunk. If your chunks are larger, they're silently truncated. Set chunk_size=512 as a safe default.

Sub-question query engine failures

The sub-question engine generates sub-queries using the LLM. If the LLM generates malformed sub-queries, the whole pipeline fails silently. Log sub-questions before routing them.

Automate Your RAG Diagnosis

Manually working through this checklist for every RAG failure is time-consuming. The RAG Failure Debugger automates the classification step โ€” paste your trace or describe the problem, and get an instant failure mode diagnosis with copy-paste code fixes.

๐Ÿ”

Diagnose Your RAG Failure Automatically

Paste your RAG trace or describe the problem. Get instant failure mode classification and copy-paste code fixes.

Try RAG Failure Debugger โ€” Free

3 free analyses/month ยท Pro unlimited at $9

Recommended Hosting for AI/ML Projects

  • DigitalOcean โ€” $200 free credit. GPU droplets for LLM inference, managed vector DBs coming soon.
  • Hostinger โ€” From $2.99/mo. Fast VPS for RAG API servers.

Frequently Asked Questions

What is LlamaIndex RAG debugging? +
LlamaIndex RAG debugging involves diagnosing issues in retrieval-augmented generation pipelines built with LlamaIndex. This includes troubleshooting document indexing, embedding generation, vector search, and response synthesis.
Why is my LlamaIndex returning empty results? +
Check your index configuration, verify documents were properly loaded, ensure embedding dimensions match, and confirm similarity_top_k is set appropriately (>0).
How do I debug LlamaIndex queries? +
Enable debug logging with logging.basicConfig(level=logging.DEBUG), use response.source_nodes to inspect retrieved documents, and trace query transformations step by step.
What are common LlamaIndex problems? +
Common issues: incorrect node parsing, embedding dimension mismatches, wrong similarity metrics, insufficient context window, and query transformation errors.
How can I optimize LlamaIndex performance? +
Use appropriate node parsers, tune chunk sizes, implement caching, use efficient vector stores, and consider response synthesis methods (refine, compact, tree).
Does LlamaIndex support custom retrievers? +
Yes. LlamaIndex allows custom retriever implementation by extending the BaseRetriever class. You can combine multiple retrievers with RetrieverQueryEngine for hybrid approaches.
How do I handle multi-document queries? +
Use ListIndex for small document sets, VectorStoreIndex for larger collections, or KeywordTableIndex for keyword-based retrieval. Combine with appropriate query engines.
What's the best vector store for LlamaIndex? +
For small projects: simple VectorStore. For production: Pinecone (managed), Weaviate (self-hosted), or pgvector (PostgreSQL). Choose based on scale and operational requirements.