Hallucination Detection
MindReef automatically analyzes LLM outputs to detect potential hallucinations, ungrounded claims, and factual inconsistencies. Catch issues before they reach your users.
How It Works
When you trace LLM calls, MindReef analyzes the relationship between inputs (context, system prompts, user queries) and outputs (model responses). Our detectors flag responses that:
- Make claims not supported by the provided context
- Contain internal contradictions
- Assert facts that weren't in the input
- Show low confidence or hedging language
Detection Methods
Grounding Checker
Verifies that claims in the output are grounded in the provided context. Extracts factual statements from the response and checks each against the input context.
Best for: RAG applications, document Q&A, customer support agents
Returns: A grounding score from 0 (ungrounded) to 1 (fully grounded), plus a list of flagged claims
Consistency Checker
Analyzes the response for internal consistency. Detects contradictions within a single response where the model says conflicting things.
Best for: Long-form content, multi-step reasoning, complex explanations
Returns: A consistency score plus identified contradictions
Viewing Detection Results
Detection results appear automatically in the dashboard for each traced LLM span. You'll see:
- Overall Score: Combined hallucination risk assessment
- Flagged Status: Whether the response exceeded your threshold
- Specific Issues: List of potentially problematic claims
- Context Comparison: Side-by-side view of claims vs. source context
Configuring Thresholds
Set custom thresholds to control when responses are flagged:
from mindreef import MindReef
mr = MindReef(
hallucination_config={
"grounding_threshold": 0.7, # Flag if grounding score < 0.7
"consistency_threshold": 0.8, # Flag if consistency score < 0.8
"enabled": True, # Enable detection
}
)
Providing Context
For best results, explicitly provide the context your agent is working with:
from mindreef import trace, set_context
@trace
async def rag_agent(query: str):
# Retrieve context
docs = await search_documents(query)
# Tell MindReef what context the LLM has
set_context(docs)
# Generate response - detection runs against this context
response = await generate_response(query, docs)
return response
Setting Up Alerts
Get notified when hallucinations are detected:
# In dashboard: Settings → Alerts → New Alert
# Alert when grounding score drops below threshold
{
"name": "Hallucination Alert",
"condition": "hallucination.flagged == true",
"channel": "slack",
"threshold_count": 5,
"window_minutes": 60
}
Note: Hallucination detection adds a small amount of latency (~100-200ms) as responses are analyzed. This happens asynchronously and doesn't block your agent's response to users.
Interpreting Scores
- 0.9-1.0: High confidence, well-grounded response
- 0.7-0.9: Generally good, minor concerns possible
- 0.5-0.7: Moderate risk, review recommended
- Below 0.5: High risk, likely contains ungrounded claims
Best Practices
- Always provide explicit context for RAG applications
- Start with conservative thresholds and adjust based on false positive rates
- Use detection results to improve your prompts over time
- Combine with human review for high-stakes decisions