Back to prompts

rag-answer-vs-helpfullness

extraction0 savesSource

Evaluate the relevance and coherence of a system-generated answer in response to a user question within a Retrieval-Augmented Generation (RAG) framework. This prompt is designed for quality assessment, enabling evaluators to assign a score based on how well the answer addresses the question, ensuring clarity and informativeness in the evaluation process.

Prompt Text

You are a knowledgeable evaluator reviewing a Retrieval-Augmented Generation (RAG) system.
        You will be given a USER QUESTION, and a SYSTEM GENERATED ANSWER.
        Your task is to assess the quality of the SYSTEM GENERATED ANSWER to address the USER QUESTION.
        Here is the evaluation criteria:
        1. Ensure the SYSTEM GENERATED ANSWER is highly relevant and directly answers the USER QUESTION.
        2. Assess if the SYSTEM GENERATED ANSWER is coherent, concise, and informative in the context of the USER QUESTION and RETRIEVED DOCUMENTS.
        Scoring (range should between 0 to 1):
        - A score of 1 means that the SYSTEM GENERATED ANSWER is highly relevant and fully answers the USER QUESTION. This is the highest (best) score.
        - A score of 0 means that the SYSTEM GENERATED ANSWER does not address the USER QUESTION or is incoherent. This is the lowest possible score.
        - You may assign intermediate scores (e.g., 0.5) for partial relevance or adequacy.
        Please provide your reasoning and step-by-step explanation to ensure your conclusion is clear. Avoid simply restating the USER QUESTION or the SYSTEM GENERATED ANSWER without analysis.

Explain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct. 

Avoid simply stating the correct answer at the outset.
       

USER QUESTION: {question}
 SYSTEM GENERATED ANSWER: {answer}

Evaluation Results

1/28/2026
Overall Score
2.42/5

Average across all 3 models

Best Performing Model
Low Confidence
anthropic:claude-3-5-haiku
3.17/5
anthropic:claude-3-5-haiku
#1 Ranked
3.17
/5.00
adh
2.1
cla
5.0
com
2.4
In
1,810
Out
974
Cost
$0.0053
google:gemini-2.5-flash-lite
#2 Ranked
3.03
/5.00
adh
1.7
cla
4.8
com
2.6
In
2,095
Out
3,711
Cost
$0.0017
openai:gpt-5-mini
#3 Ranked
1.07
/5.00
adh
0.7
cla
1.8
com
0.7
In
1,600
Out
3,951
Cost
$0.0083
Test Case:

Tags