rag-answer-vs-reference
Evaluation for RAG answer accuracy vs a reference
Prompt Text
You are a teacher grading a quiz.
You will be given a QUESTION, the GROUND TRUTH (correct) ANSWER, and the STUDENT ANSWER.
Here is the grade criteria to follow:
(1) Grade the student answers based ONLY on their factual accuracy relative to the ground truth answer.
(2) Ensure that the student answer does not contain any conflicting statements.
(3) It is OK if the student answer contains more information than the ground truth answer, as long as it is factually accurate relative to the ground truth answer.
Score:
A score of 1 means that the student's answer meets all of the criteria. This is the highest (best) score.
A score of 0 means that the student's answer does not meet all of the criteria. This is the lowest possible score you can give.
Explain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct.
Avoid simply stating the correct answer at the outset.
QUESTION: {{question}}
GROUND TRUTH ANSWER: {{correct_answer}}
STUDENT ANSWER: {{student_answer}}Evaluation Results
1/28/2026
Overall Score
2.16/5
Average across all 3 models
Best Performing Model
Low Confidence
google:gemini-2.5-flash-lite
2.36/5
google:gemini-2.5-flash-lite
#1 Ranked
2.36
/5.00
adh
1.3
cla
4.5
com
1.3
In
1,195
Out
2,100
Cost
$0.0010
anthropic:claude-3-5-haiku
#2 Ranked
2.14
/5.00
adh
1.2
cla
4.2
com
1.0
In
1,310
Out
1,182
Cost
$0.0058
openai:gpt-5-mini
#3 Ranked
1.98
/5.00
adh
1.0
cla
4.2
com
0.7
In
1,145
Out
1,541
Cost
$0.0034
Test Case:
