LangChain makes building RAG applications easy. Debugging them when they break? Not so much.
When your LangChain RAG chain returns irrelevant answers, you need to know: Is the retriever configured correctly? Is context being assembled properly? Is the LLM ignoring the retrieved documents?
This guide covers common LangChain RAG issues with working code fixes.
š§ Visual RAG Debugging
Use rag-debugger.pages.dev to visualize LangChain RAG outputs. Paste retrieved documents and LLM responses to identify failure points. Free: 10 sessions/month.
Common LangChain RAG Issues
Issue 1: Retriever Returns Empty Results
Problem: retriever.invoke(query) returns []
Symptoms: Chain completes but answer is "I don't have information about..." despite relevant docs in vectorstore.
Debug Steps:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
# 1. Check vectorstore has documents
print(f"Vectorstore size: {len(vectorstore.index_to_docstore_id)}")
# 2. Test retriever directly
query = "your test query"
results = retriever.invoke(query)
print(f"Results count: {len(results)}")
# 3. Check similarity scores
for doc in results:
print(f"Score: {doc.metadata.get('score', 'N/A')} - {doc.page_content[:50]}")
# 4. Test with different search types
faiss_retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.5, "k": 5}
)
results = faiss_retriever.invoke(query)
print(f"With threshold: {len(results)} results")
Fixes:
# Fix 1: Lower similarity threshold
retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.3, "k": 10} # Lower threshold, more results
)
# Fix 2: Use MMR for diversity
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.5}
)
# Fix 3: Add hybrid search with BM25
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
bm25 = BM25Retriever.from_documents(documents)
bm25.k = 5
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25, vector_retriever],
weights=[0.4, 0.6]
)
Issue 2: Context Window Overflow
Problem: Retrieved docs exceed LLM context limit
Symptoms: Error "Requested tokens exceed context window" or model ignores middle content.
Debug:
import tiktoken
def check_context_tokens(docs, llm_model="gpt-3.5-turbo"):
encoding = tiktoken.encoding_for_model(llm_model)
total_tokens = sum(
len(encoding.encode(doc.page_content))
for doc in docs
)
context_limits = {
"gpt-3.5-turbo": 16385,
"gpt-4-turbo": 128000,
"claude-3-haiku": 200000
}
limit = context_limits.get(llm_model, 8000)
print(f"Context: {total_tokens}/{limit} tokens ({total_tokens/limit:.1%})")
return total_tokens
# In your chain
docs = retriever.invoke(query)
check_context_tokens(docs, "gpt-3.5-turbo")
Fixes:
# Fix 1: Add context compression
from langchain.retrievers.document_compressors import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever
compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20})
)
# Now retrieves 20, compresses to top 5
compressed_docs = compression_retriever.invoke(query)
# Fix 2: Use MapReduce chain for large contexts
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import ChatPromptTemplate
# Map
map_prompt = ChatPromptTemplate.from_template(
"Extract relevant information from this text about {query}:\n{text}"
)
map_chain = map_prompt | llm
# Reduce
reduce_prompt = ChatPromptTemplate.from_template(
"Synthesize these summaries into a coherent answer:\n{summaries}"
)
reduce_chain = reduce_prompt | llm
# Combine
combine_documents_chain = StuffDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=reduce_prompt),
document_variable_name="summaries"
)
map_reduce_chain = MapReduceDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=map_prompt),
combine_documents_chain=combine_documents_chain,
input_key="input_documents",
output_key="output"
)
Issue 3: LLM Ignores Retrieved Context
Problem: Model answers from training data, not retrieved docs
Symptoms: Answer includes information not in retrieved documents. Model cites external sources.
Debug:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Check your prompt template
print("Current prompt template:")
print(retrieval_qa.prompt.template)
# Add explicit grounding instructions
grounded_prompt = ChatPromptTemplate.from_template("""You are a helpful assistant that answers questions ONLY based on the provided context.
Rules:
1. If the answer is not in the context, say "I don't have enough information in the provided documents."
2. Cite sources using [Document X] notation.
3. Quote exact passages when making factual claims.
4. Never use outside knowledge.
Context:
{context}
Question: {query}
Answer:""")
chain = grounded_prompt | llm | StrOutputParser()
Fixes:
# Fix 1: Stronger grounding prompt
from langchain_core.prompts import PromptTemplate
RAG_PROMPT = PromptTemplate(
template="""<|system|>
You are an AI assistant that answers questions based on the provided context.
- Answer ONLY using information from the context below
- If the context doesn't contain the answer, say so clearly
- Quote directly from the context when possible
- Cite which document each piece of information comes from
Context:
{context}
<|user|>
{query}
<|assistant|>
""",
input_variables=["context", "query"]
)
# Fix 2: Lower temperature for factual queries
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-3.5-turbo",
temperature=0, # Deterministic for factual QA
max_tokens=1000
)
# Fix 3: Add post-hoc verification
def verify_answer(answer: str, context: str) -> dict:
"""Check if answer claims are supported by context"""
# Simple similarity-based verification
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
model = SentenceTransformer('all-MiniLM-L6-v2')
# Split into sentences
answer_sentences = [s.strip() for s in answer.split('.') if len(s.strip()) > 10]
context_sentences = [s.strip() for s in context.replace('\n', ' ').split('.') if len(s.strip()) > 10]
# Check each answer sentence
verification = []
for ans_sent in answer_sentences:
ans_emb = model.encode([ans_sent])
max_sim = max(
cosine_similarity(ans_emb, model.encode([ctx_sent]))[0][0]
for ctx_sent in context_sentences
)
verification.append({
"sentence": ans_sent,
"supported": max_sim > 0.7,
"similarity": max_sim
})
return {
"verified": all(v["supported"] for v in verification),
"details": verification
}
Issue 4: Retriever Configuration Mistakes
Problem: Wrong search_type or search_kwargs
Common mistakes:
- Using
search_type="similarity"when you need score threshold - Setting
ktoo low (misses context) or too high (overwhelms LLM) - Not using MMR for deduplication
Configuration Guide:
# Configuration 1: Basic similarity search (good for small, focused datasets)
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5}
)
# Configuration 2: Score threshold (filters low-quality matches)
retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.5, "k": 10}
)
# Configuration 3: MMR (diverse results, avoids duplicates)
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.5}
)
# lambda_mult: 1 = pure MMR, 0 = pure similarity
# Configuration 4: Multi-vector retriever (for long documents)
from langchain.retrievers.multi_vector import MultiVectorRetriever
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=summary_store, # Store parent doc summaries
id_key_doc_mapper=IdKeyDocMapper("doc_id")
)
Issue 5: Chain Validation Errors
Problem: chain.invoke() raises validation errors
Symptoms: InputMissingException or OutputParserException
Debug:
from langchain_core.exceptions import OutputParserException
# Enable verbose mode
chain.verbose = True
# Check input variables
print(f"Chain expects: {chain.input_keys}")
print(f"Chain outputs: {chain.output_keys}")
# Test with explicit input dict
try:
result = chain.invoke({
"query": "your question",
"input_documents": [...] # If required
})
except OutputParserException as e:
print(f"Parser error: {e}")
print(f"Raw output: {e.llm_output}")
Fixes:
# Fix 1: Use RunnableLambda for custom validation
from langchain_core.runnables import RunnableLambda
def validate_input(inputs: dict) -> dict:
if not inputs.get("query"):
raise ValueError("Query is required")
if len(inputs["query"]) < 5:
raise ValueError("Query too short")
return inputs
validated_chain = (
RunnableLambda(validate_input)
| retrieval_chain
)
# Fix 2: Add retry logic for transient errors
from tenacity import retry, stop_after_attempt, retry_if_exception_type
@retry(
stop=stop_after_attempt(3),
retry=retry_if_exception_type(OutputParserException)
)
def invoke_with_retry(chain, inputs):
return chain.invoke(inputs)
result = invoke_with_retry(chain, {"query": "your question"})
Debugging Tools & Patterns
Pattern 1: LangChain Debugging Callback
from langchain_core.callbacks import BaseCallbackHandler
from typing import Any, Dict, List
class DebugCallbackHandler(BaseCallbackHandler):
def __init__(self):
self.retrieved_docs = []
self.llm_calls = []
self.errors = []
def on_retriever_end(
self,
documents: List[Any],
*,
run_id: str,
parent_run_id: str,
**kwargs: Any,
) -> Any:
self.retrieved_docs = documents
print(f"\nš Retrieved {len(documents)} documents:")
for i, doc in enumerate(documents[:5]):
print(f" {i+1}. Score={doc.metadata.get('score', 'N/A')}")
print(f" {doc.page_content[:100]}...")
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
**kwargs: Any,
) -> Any:
self.llm_calls.append({"prompts": prompts})
print(f"\nš¤ LLM called with {len(prompts[0])} chars")
def on_chain_error(
self,
error: BaseException,
*,
run_id: str,
**kwargs: Any,
) -> Any:
self.errors.append(error)
print(f"\nā Chain error: {error}")
# Usage
debug_callback = DebugCallbackHandler()
chain = retrieval_qa.with_config(
callbacks=[debug_callback],
verbose=True
)
result = chain.invoke({"query": "your question"})
Pattern 2: Trace Without LangSmith
import json
from datetime import datetime
class SimpleTracer:
def __init__(self, log_file: str = "rag_traces.jsonl"):
self.log_file = log_file
def trace(self, query: str, docs: list, response: str, metadata: dict = None):
trace = {
"timestamp": datetime.now().isoformat(),
"query": query,
"retrieved_docs": [
{"content": d.page_content, "score": d.metadata.get("score")}
for d in docs
],
"response": response,
"metadata": metadata or {}
}
with open(self.log_file, "a") as f:
f.write(json.dumps(trace) + "\n")
return trace
# Usage in your chain
tracer = SimpleTracer()
docs = retriever.invoke(query)
response = llm.invoke(format_prompt(docs, query))
tracer.trace(query, docs, response, {"latency_ms": latency})
# Review traces
# cat rag_traces.jsonl | jq '.'
š Automated RAG Debugging
RAG Debugger provides visual debugging for LangChain RAG pipelines:
- Auto-detect retrieval issues
- Visualize document relevance scores
- Hallucination detection
- Export traces for team review
Try 10 free debug sessions ā rag-debugger.pages.dev
Quick Reference Checklist
Before deploying your LangChain RAG chain:
- ā Retriever returns 3-10 relevant docs for test queries
- ā Similarity scores > 0.5 for top results
- ā Total context tokens < 80% of LLM limit
- ā Prompt includes grounding instructions
- ā Temperature = 0 for factual queries
- ā Citation requirement in prompt
- ā Error handling for empty retrievals
- ā Tracing/logging enabled
Conclusion
Debugging LangChain RAG applications requires systematic checking of each component:
- Retriever: Verify configuration and results
- Context assembly: Check token count and formatting
- LLM: Ensure grounding and citation
- Validation: Add error handling and retries
For faster debugging, try RAG Debugger ā a visual tool that automates trace analysis and failure detection. Start with 10 free sessions at rag-debugger.pages.dev.