LangChain RAG Debugging: Complete Guide to Fixing Retrieval Issues

LangChain makes building RAG applications easy. Debugging them when they break? Not so much.

When your LangChain RAG chain returns irrelevant answers, you need to know: Is the retriever configured correctly? Is context being assembled properly? Is the LLM ignoring the retrieved documents?

This guide covers common LangChain RAG issues with working code fixes.

🔧 Visual RAG Debugging

Use rag-debugger.pages.dev to visualize LangChain RAG outputs. Paste retrieved documents and LLM responses to identify failure points. Free: 10 sessions/month.

Common LangChain RAG Issues

Issue 1: Retriever Returns Empty Results

Problem: `retriever.invoke(query)` returns `[]`

Symptoms: Chain completes but answer is "I don't have information about..." despite relevant docs in vectorstore.

Debug Steps:

from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

# 1. Check vectorstore has documents
print(f"Vectorstore size: {len(vectorstore.index_to_docstore_id)}")

# 2. Test retriever directly
query = "your test query"
results = retriever.invoke(query)
print(f"Results count: {len(results)}")

# 3. Check similarity scores
for doc in results:
    print(f"Score: {doc.metadata.get('score', 'N/A')} - {doc.page_content[:50]}")

# 4. Test with different search types
faiss_retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.5, "k": 5}
)
results = faiss_retriever.invoke(query)
print(f"With threshold: {len(results)} results")

Fixes:

# Fix 1: Lower similarity threshold
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.3, "k": 10}  # Lower threshold, more results
)

# Fix 2: Use MMR for diversity
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.5}
)

# Fix 3: Add hybrid search with BM25
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(documents)
bm25.k = 5
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25, vector_retriever],
    weights=[0.4, 0.6]
)

Issue 2: Context Window Overflow

Problem: Retrieved docs exceed LLM context limit

Symptoms: Error "Requested tokens exceed context window" or model ignores middle content.

Debug:

import tiktoken

def check_context_tokens(docs, llm_model="gpt-3.5-turbo"):
    encoding = tiktoken.encoding_for_model(llm_model)

    total_tokens = sum(
        len(encoding.encode(doc.page_content))
        for doc in docs
    )

    context_limits = {
        "gpt-3.5-turbo": 16385,
        "gpt-4-turbo": 128000,
        "claude-3-haiku": 200000
    }

    limit = context_limits.get(llm_model, 8000)
    print(f"Context: {total_tokens}/{limit} tokens ({total_tokens/limit:.1%})")

    return total_tokens

# In your chain
docs = retriever.invoke(query)
check_context_tokens(docs, "gpt-3.5-turbo")

Fixes:

# Fix 1: Add context compression
from langchain.retrievers.document_compressors import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever

compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20})
)

# Now retrieves 20, compresses to top 5
compressed_docs = compression_retriever.invoke(query)

# Fix 2: Use MapReduce chain for large contexts
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import ChatPromptTemplate

# Map
map_prompt = ChatPromptTemplate.from_template(
    "Extract relevant information from this text about {query}:\n{text}"
)
map_chain = map_prompt | llm

# Reduce
reduce_prompt = ChatPromptTemplate.from_template(
    "Synthesize these summaries into a coherent answer:\n{summaries}"
)
reduce_chain = reduce_prompt | llm

# Combine
combine_documents_chain = StuffDocumentsChain(
    llm_chain=LLMChain(llm=llm, prompt=reduce_prompt),
    document_variable_name="summaries"
)

map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=LLMChain(llm=llm, prompt=map_prompt),
    combine_documents_chain=combine_documents_chain,
    input_key="input_documents",
    output_key="output"
)

Issue 3: LLM Ignores Retrieved Context

Problem: Model answers from training data, not retrieved docs

Symptoms: Answer includes information not in retrieved documents. Model cites external sources.

Debug:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Check your prompt template
print("Current prompt template:")
print(retrieval_qa.prompt.template)

# Add explicit grounding instructions
grounded_prompt = ChatPromptTemplate.from_template("""You are a helpful assistant that answers questions ONLY based on the provided context.

Rules:
1. If the answer is not in the context, say "I don't have enough information in the provided documents."
2. Cite sources using [Document X] notation.
3. Quote exact passages when making factual claims.
4. Never use outside knowledge.

Context:
{context}

Question: {query}

Answer:""")

chain = grounded_prompt | llm | StrOutputParser()

Fixes:

# Fix 1: Stronger grounding prompt
from langchain_core.prompts import PromptTemplate

RAG_PROMPT = PromptTemplate(
    template="""<|system|>
You are an AI assistant that answers questions based on the provided context.
- Answer ONLY using information from the context below
- If the context doesn't contain the answer, say so clearly
- Quote directly from the context when possible
- Cite which document each piece of information comes from

Context:
{context}

<|user|>
{query}

<|assistant|>
""",
    input_variables=["context", "query"]
)

# Fix 2: Lower temperature for factual queries
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0,  # Deterministic for factual QA
    max_tokens=1000
)

# Fix 3: Add post-hoc verification
def verify_answer(answer: str, context: str) -> dict:
    """Check if answer claims are supported by context"""

    # Simple similarity-based verification
    from sentence_transformers import SentenceTransformer
    from sklearn.metrics.pairwise import cosine_similarity

    model = SentenceTransformer('all-MiniLM-L6-v2')

    # Split into sentences
    answer_sentences = [s.strip() for s in answer.split('.') if len(s.strip()) > 10]
    context_sentences = [s.strip() for s in context.replace('\n', ' ').split('.') if len(s.strip()) > 10]

    # Check each answer sentence
    verification = []
    for ans_sent in answer_sentences:
        ans_emb = model.encode([ans_sent])
        max_sim = max(
            cosine_similarity(ans_emb, model.encode([ctx_sent]))[0][0]
            for ctx_sent in context_sentences
        )
        verification.append({
            "sentence": ans_sent,
            "supported": max_sim > 0.7,
            "similarity": max_sim
        })

    return {
        "verified": all(v["supported"] for v in verification),
        "details": verification
    }

Issue 4: Retriever Configuration Mistakes

Problem: Wrong search_type or search_kwargs

Common mistakes:

Using search_type="similarity" when you need score threshold
Setting k too low (misses context) or too high (overwhelms LLM)
Not using MMR for deduplication

Configuration Guide:

# Configuration 1: Basic similarity search (good for small, focused datasets)
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# Configuration 2: Score threshold (filters low-quality matches)
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.5, "k": 10}
)

# Configuration 3: MMR (diverse results, avoids duplicates)
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.5}
)
# lambda_mult: 1 = pure MMR, 0 = pure similarity

# Configuration 4: Multi-vector retriever (for long documents)
from langchain.retrievers.multi_vector import MultiVectorRetriever

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=summary_store,  # Store parent doc summaries
    id_key_doc_mapper=IdKeyDocMapper("doc_id")
)

Issue 5: Chain Validation Errors

Problem: `chain.invoke()` raises validation errors

Symptoms: InputMissingException or OutputParserException

Debug:

from langchain_core.exceptions import OutputParserException

# Enable verbose mode
chain.verbose = True

# Check input variables
print(f"Chain expects: {chain.input_keys}")
print(f"Chain outputs: {chain.output_keys}")

# Test with explicit input dict
try:
    result = chain.invoke({
        "query": "your question",
        "input_documents": [...]  # If required
    })
except OutputParserException as e:
    print(f"Parser error: {e}")
    print(f"Raw output: {e.llm_output}")

Fixes:

# Fix 1: Use RunnableLambda for custom validation
from langchain_core.runnables import RunnableLambda

def validate_input(inputs: dict) -> dict:
    if not inputs.get("query"):
        raise ValueError("Query is required")
    if len(inputs["query"]) < 5:
        raise ValueError("Query too short")
    return inputs

validated_chain = (
    RunnableLambda(validate_input)
    | retrieval_chain
)

# Fix 2: Add retry logic for transient errors
from tenacity import retry, stop_after_attempt, retry_if_exception_type

@retry(
    stop=stop_after_attempt(3),
    retry=retry_if_exception_type(OutputParserException)
)
def invoke_with_retry(chain, inputs):
    return chain.invoke(inputs)

result = invoke_with_retry(chain, {"query": "your question"})

Debugging Tools & Patterns

Pattern 1: LangChain Debugging Callback

from langchain_core.callbacks import BaseCallbackHandler
from typing import Any, Dict, List

class DebugCallbackHandler(BaseCallbackHandler):
    def __init__(self):
        self.retrieved_docs = []
        self.llm_calls = []
        self.errors = []

    def on_retriever_end(
        self,
        documents: List[Any],
        *,
        run_id: str,
        parent_run_id: str,
        **kwargs: Any,
    ) -> Any:
        self.retrieved_docs = documents
        print(f"\n📚 Retrieved {len(documents)} documents:")
        for i, doc in enumerate(documents[:5]):
            print(f"  {i+1}. Score={doc.metadata.get('score', 'N/A')}")
            print(f"     {doc.page_content[:100]}...")

    def on_llm_start(
        self,
        serialized: Dict[str, Any],
        prompts: List[str],
        **kwargs: Any,
    ) -> Any:
        self.llm_calls.append({"prompts": prompts})
        print(f"\n🤖 LLM called with {len(prompts[0])} chars")

    def on_chain_error(
        self,
        error: BaseException,
        *,
        run_id: str,
        **kwargs: Any,
    ) -> Any:
        self.errors.append(error)
        print(f"\n❌ Chain error: {error}")

# Usage
debug_callback = DebugCallbackHandler()
chain = retrieval_qa.with_config(
    callbacks=[debug_callback],
    verbose=True
)
result = chain.invoke({"query": "your question"})

Pattern 2: Trace Without LangSmith

import json
from datetime import datetime

class SimpleTracer:
    def __init__(self, log_file: str = "rag_traces.jsonl"):
        self.log_file = log_file

    def trace(self, query: str, docs: list, response: str, metadata: dict = None):
        trace = {
            "timestamp": datetime.now().isoformat(),
            "query": query,
            "retrieved_docs": [
                {"content": d.page_content, "score": d.metadata.get("score")}
                for d in docs
            ],
            "response": response,
            "metadata": metadata or {}
        }

        with open(self.log_file, "a") as f:
            f.write(json.dumps(trace) + "\n")

        return trace

# Usage in your chain
tracer = SimpleTracer()

docs = retriever.invoke(query)
response = llm.invoke(format_prompt(docs, query))

tracer.trace(query, docs, response, {"latency_ms": latency})

# Review traces
# cat rag_traces.jsonl | jq '.'

🚀 Automated RAG Debugging

RAG Debugger provides visual debugging for LangChain RAG pipelines:

Auto-detect retrieval issues
Visualize document relevance scores
Hallucination detection
Export traces for team review

Try 10 free debug sessions → rag-debugger.pages.dev

Quick Reference Checklist

Before deploying your LangChain RAG chain:

✓ Retriever returns 3-10 relevant docs for test queries
✓ Similarity scores > 0.5 for top results
✓ Total context tokens < 80% of LLM limit
✓ Prompt includes grounding instructions
✓ Temperature = 0 for factual queries
✓ Citation requirement in prompt
✓ Error handling for empty retrievals
✓ Tracing/logging enabled

Conclusion

Debugging LangChain RAG applications requires systematic checking of each component:

Retriever: Verify configuration and results
Context assembly: Check token count and formatting
LLM: Ensure grounding and citation
Validation: Add error handling and retries

For faster debugging, try RAG Debugger — a visual tool that automates trace analysis and failure detection. Start with 10 free sessions at rag-debugger.pages.dev.

LangChain RAG Debugging: Complete Guide

🔧 Visual RAG Debugging

Common LangChain RAG Issues

Issue 1: Retriever Returns Empty Results

Problem: retriever.invoke(query) returns []

Issue 2: Context Window Overflow

Problem: Retrieved docs exceed LLM context limit

Issue 3: LLM Ignores Retrieved Context

Problem: Model answers from training data, not retrieved docs

Issue 4: Retriever Configuration Mistakes

Problem: Wrong search_type or search_kwargs

Issue 5: Chain Validation Errors

Problem: chain.invoke() raises validation errors

Debugging Tools & Patterns

Pattern 1: LangChain Debugging Callback

Pattern 2: Trace Without LangSmith

🚀 Automated RAG Debugging

Quick Reference Checklist

Conclusion

Problem: `retriever.invoke(query)` returns `[]`

Problem: `chain.invoke()` raises validation errors