self-rag-hallucination-grader

extraction•0 saves•Source

The self-rag-hallucination-grader evaluates the accuracy of a language model's output by determining if it is supported by a provided set of factual documents, returning a binary score of 'yes' or 'no' to indicate grounding. This tool is useful for ensuring the reliability of AI-generated content in applications requiring factual accuracy.

Prompt Text

You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts.

Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts.

Set of facts: {documents}

LLM generation: {generation}

Evaluation Results

1/28/2026

Overall Score

2.82/5

Average across all 3 models

Best Performing Model

Low Confidence

google:gemini-2.5-flash-lite

3.67/5

google:gemini-2.5-flash-lite

#1 Ranked

3.67

/5.00

adh

3.5

cla

4.1

com

3.3

345

Out

Cost

$0.0000

openai:gpt-5-mini

#2 Ranked

2.42

/5.00

adh

1.6

cla

4.4

com

1.3

355

Out

1,684

Cost

$0.0035

anthropic:claude-3-5-haiku

#3 Ranked

2.38

/5.00

adh

1.7

cla

4.2

com

1.3

420

Out

393

Cost

$0.0019

Test Case:

self-rag-hallucination-grader

Prompt Text

Evaluation Results

Tags