Back to prompts

rag-answer-hallucination

extraction0 savesSource

Evaluate whether a RAG answer is supported by the retrieved documents. This is useful for catching cases of answer hallucination.

Prompt Text

You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. 

Give a binary score 1 or 0, where 1 means that the answer is grounded in / supported by the set of facts.

Facts: {{input.documents}} 

LLM generation: {{output}}

Evaluation Results

1/28/2026
Overall Score
2.72/5

Average across all 3 models

Best Performing Model
Low Confidence
google:gemini-2.5-flash-lite
3.73/5
google:gemini-2.5-flash-lite
#1 Ranked
3.73
/5.00
adh
3.5
cla
4.0
com
3.7
In
350
Out
5
Cost
$0.0000
anthropic:claude-3-5-haiku
#2 Ranked
2.23
/5.00
adh
1.3
cla
4.4
com
1.0
In
420
Out
586
Cost
$0.0027
openai:gpt-5-mini
#3 Ranked
2.19
/5.00
adh
1.2
cla
4.4
com
1.0
In
355
Out
1,857
Cost
$0.0038
Test Case:

Tags