LlamaIndex provides powerful abstractions for RAG, but debugging requires understanding how those abstractions interact. Here's how to get full visibility into your LlamaIndex pipeline.
Diagnose Your RAG Failure Automatically
Paste your RAG trace or describe the problem. Get instant failure mode classification and copy-paste code fixes.
Try RAG Failure Debugger โ Free3 free analyses/month ยท Pro unlimited at $9/mo
Enable Full Tracing
CallbackManager setup
Import CallbackManager and LlamaDebugHandler. Add the handler to Settings.callback_manager. Every node retrieval, LLM call, and token count is now tracked in handler.get_event_pairs().
Arize Phoenix integration
For production observability, integrate Arize Phoenix: set_global_handler('arize_phoenix'). You get a full UI showing retrieval scores, LLM traces, and latency breakdown per query.
Query Engine Debugging
Inspect retrieved nodes
Call query_engine.retriever.retrieve(query) directly to see raw retrieval before synthesis. Print node.score and node.text[:200] for each node to spot bad chunks.
Response synthesizer isolation
Build a ResponseSynthesizer separately and call synthesize(query, nodes=your_nodes) with hand-picked nodes. Isolates whether synthesis or retrieval is the problem.
Node postprocessors
Add SimilarityPostprocessor(similarity_cutoff=0.7) to filter low-score nodes before synthesis. Add KeywordNodePostprocessor to require query keywords in retrieved nodes.
Common LlamaIndex Issues
Index staleness
If you add new documents without rebuilding the index, they won't be retrievable. Use index.refresh_ref_docs(documents) for incremental updates. Log document count before and after.
Chunk size vs embedding model mismatch
OpenAI's text-embedding-3-small has a 8191 token limit per chunk. If your chunks are larger, they're silently truncated. Set chunk_size=512 as a safe default.
Sub-question query engine failures
The sub-question engine generates sub-queries using the LLM. If the LLM generates malformed sub-queries, the whole pipeline fails silently. Log sub-questions before routing them.
Automate Your RAG Diagnosis
Manually working through this checklist for every RAG failure is time-consuming. The RAG Failure Debugger automates the classification step โ paste your trace or describe the problem, and get an instant failure mode diagnosis with copy-paste code fixes.
Diagnose Your RAG Failure Automatically
Paste your RAG trace or describe the problem. Get instant failure mode classification and copy-paste code fixes.
Try RAG Failure Debugger โ Free3 free analyses/month ยท Pro unlimited at $9/mo
Recommended Hosting for AI/ML Projects
- DigitalOcean โ $200 free credit. GPU droplets for LLM inference, managed vector DBs coming soon.
- Hostinger โ From $2.99/mo. Fast VPS for RAG API servers.
- Namecheap โ Budget hosting + free domain for your AI projects.