Hallucination Detection

MindReef automatically analyzes LLM outputs to detect potential hallucinations, ungrounded claims, and factual inconsistencies. Catch issues before they reach your users.

How It Works

When you trace LLM calls, MindReef analyzes the relationship between inputs (context, system prompts, user queries) and outputs (model responses). Our detectors flag responses that:

Detection Methods

Grounding Checker

Verifies that claims in the output are grounded in the provided context. Extracts factual statements from the response and checks each against the input context.

Best for: RAG applications, document Q&A, customer support agents

Returns: A grounding score from 0 (ungrounded) to 1 (fully grounded), plus a list of flagged claims

Consistency Checker

Analyzes the response for internal consistency. Detects contradictions within a single response where the model says conflicting things.

Best for: Long-form content, multi-step reasoning, complex explanations

Returns: A consistency score plus identified contradictions

Viewing Detection Results

Detection results appear automatically in the dashboard for each traced LLM span. You'll see:

Configuring Thresholds

Set custom thresholds to control when responses are flagged:

from mindreef import MindReef

mr = MindReef(
    hallucination_config={
        "grounding_threshold": 0.7,     # Flag if grounding score < 0.7
        "consistency_threshold": 0.8,   # Flag if consistency score < 0.8
        "enabled": True,                 # Enable detection
    }
)

Providing Context

For best results, explicitly provide the context your agent is working with:

from mindreef import trace, set_context

@trace
async def rag_agent(query: str):
    # Retrieve context
    docs = await search_documents(query)

    # Tell MindReef what context the LLM has
    set_context(docs)

    # Generate response - detection runs against this context
    response = await generate_response(query, docs)

    return response

Setting Up Alerts

Get notified when hallucinations are detected:

# In dashboard: Settings → Alerts → New Alert

# Alert when grounding score drops below threshold
{
    "name": "Hallucination Alert",
    "condition": "hallucination.flagged == true",
    "channel": "slack",
    "threshold_count": 5,
    "window_minutes": 60
}

Note: Hallucination detection adds a small amount of latency (~100-200ms) as responses are analyzed. This happens asynchronously and doesn't block your agent's response to users.

Interpreting Scores

Best Practices