rag-answer-hallucination

extraction•0 saves•Source

Evaluate whether a RAG answer is supported by the retrieved documents. This is useful for catching cases of answer hallucination.

Prompt Text

You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. 

Give a binary score 1 or 0, where 1 means that the answer is grounded in / supported by the set of facts.

Facts: {{input.documents}} 

LLM generation: {{output}}

Evaluation Results

1/22/2026

Overall Score

3.77/5

Average across all 3 models

Best Performing Model

Low Confidence

google:gemini-1.5-flash

5.00/5

google:gemini-1.5-flash

#1 Ranked

5.00

/5.00

adh

5.0

cla

5.0

com

5.0

anthropic:claude-3-haiku

#2 Ranked

3.72

/5.00

adh

3.6

cla

4.1

com

3.5

GPT-4o Mini

#3 Ranked

2.58

/5.00

adh

2.2

cla

3.6

com

2.0

Test Case:

rag-answer-hallucination

Prompt Text

Evaluation Results

Tags