Hallucination Detection

MindReef automatically analyzes LLM outputs to detect potential hallucinations, ungrounded claims, and factual inconsistencies. Catch issues before they reach your users.

How It Works

When you trace LLM calls, MindReef analyzes the relationship between inputs (context, system prompts, user queries) and outputs (model responses). Our detectors flag responses that:

Make claims not supported by the provided context
Contain internal contradictions
Assert facts that weren't in the input
Show low confidence or hedging language

Detection Methods

Grounding Checker

Verifies that claims in the output are grounded in the provided context. Extracts factual statements from the response and checks each against the input context.

Best for: RAG applications, document Q&A, customer support agents

Returns: A grounding score from 0 (ungrounded) to 1 (fully grounded), plus a list of flagged claims

Consistency Checker

Analyzes the response for internal consistency. Detects contradictions within a single response where the model says conflicting things.

Best for: Long-form content, multi-step reasoning, complex explanations

Returns: A consistency score plus identified contradictions

Viewing Detection Results

Detection results appear automatically in the dashboard for each traced LLM span. You'll see:

Overall Score: Combined hallucination risk assessment
Flagged Status: Whether the response exceeded your threshold
Specific Issues: List of potentially problematic claims
Context Comparison: Side-by-side view of claims vs. source context

Configuring Thresholds

Set custom thresholds to control when responses are flagged:

from mindreef import MindReef

mr = MindReef(
    hallucination_config={
        "grounding_threshold": 0.7,     # Flag if grounding score < 0.7
        "consistency_threshold": 0.8,   # Flag if consistency score < 0.8
        "enabled": True,                 # Enable detection
    }
)

Providing Context

For best results, explicitly provide the context your agent is working with:

from mindreef import trace, set_context

@trace
async def rag_agent(query: str):
    # Retrieve context
    docs = await search_documents(query)

    # Tell MindReef what context the LLM has
    set_context(docs)

    # Generate response - detection runs against this context
    response = await generate_response(query, docs)

    return response

Setting Up Alerts

Get notified when hallucinations are detected:

# In dashboard: Settings → Alerts → New Alert

# Alert when grounding score drops below threshold
{
    "name": "Hallucination Alert",
    "condition": "hallucination.flagged == true",
    "channel": "slack",
    "threshold_count": 5,
    "window_minutes": 60
}

Note: Hallucination detection adds a small amount of latency (~100-200ms) as responses are analyzed. This happens asynchronously and doesn't block your agent's response to users.

Interpreting Scores

0.9-1.0: High confidence, well-grounded response
0.7-0.9: Generally good, minor concerns possible
0.5-0.7: Moderate risk, review recommended
Below 0.5: High risk, likely contains ungrounded claims

Best Practices

Always provide explicit context for RAG applications
Start with conservative thresholds and adjust based on false positive rates
Use detection results to improve your prompts over time
Combine with human review for high-stakes decisions