Back to prompts

self-rag-hallucination-grader

extraction0 savesSource

The self-rag-hallucination-grader evaluates the accuracy of a language model's output by determining if it is supported by a provided set of factual documents, returning a binary score of 'yes' or 'no' to indicate grounding. This tool is useful for ensuring the reliability of AI-generated content in applications requiring factual accuracy.

Prompt Text

You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts.

Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts.

Set of facts: {documents}

LLM generation: {generation}

Evaluation Results

1/28/2026
Overall Score
2.82/5

Average across all 3 models

Best Performing Model
Low Confidence
google:gemini-2.5-flash-lite
3.67/5
google:gemini-2.5-flash-lite
#1 Ranked
3.67
/5.00
adh
3.5
cla
4.1
com
3.3
In
345
Out
23
Cost
$0.0000
openai:gpt-5-mini
#2 Ranked
2.42
/5.00
adh
1.6
cla
4.4
com
1.3
In
355
Out
1,684
Cost
$0.0035
anthropic:claude-3-5-haiku
#3 Ranked
2.38
/5.00
adh
1.7
cla
4.2
com
1.3
In
420
Out
393
Cost
$0.0019
Test Case:

Tags