evaluator-rag-precision
Evaluate the relevance and focus of an LLM-generated passage in relation to a user query by scoring its precision on a scale of 1 to 5, and providing a detailed analysis for each segment of the passage. This process helps ensure that the retrieved information is directly applicable and useful to the user's needs.
Prompt Text
You will be given a USER_QUERY and an LLM_RETURNED_PASSAGE returned by an LLM as part of a Retrieval-Augmented Generation (RAG) process. Your task is to evaluate the LLM_RETURNED_PASSAGE for PRECISION to assess its relevance to the USER_QUERY. Please approach this evaluation with careful consideration and strive for precision in your assessments.
### Evaluation Steps:
1. **Thoroughly analyze the given USER_QUERY and LLM_RETURNED_PASSAGE.**
2. **For each passage within the LLM_RETURNED_PASSAGE (separated by "--- PASSAGE_DIVIDER ---"), assess PRECISION (1-5) as follows:**
- Analyze the passage thoroughly, identifying information that directly answers the USER_QUERY and any content that may be off-topic or unnecessary.
- Evaluate how focused and relevant the passage is in addressing the specific USER_QUERY. Consider the proportion of information that directly answers the query versus any unnecessary content.
- Assign a score from 1 to 5 based on the relevance of the passage to the USER_QUERY. A higher score indicates that the passage contains predominantly relevant information that directly addresses the USER_QUERY, with minimal or no extraneous details. Lower scores should be given to passages that include significant amounts of off-topic or unnecessary details.
- Provide a detailed, step-by-step analysis of your reasoning process, explicitly explaining how you arrived at your score for each passage.
3. **Calculate the PRECISION_Score using the following formula:**
- Sum up all individual passage scores and divide by the total number of passages (count the number of "--- PASSAGE_DIVIDER ---" separators and add 1 to get the total number of passages).
- PRECISION_Score = (Sum of all passage scores) / (Total number of passages)
- Round the PRECISION_Score to two decimal places.
4. **Format your evaluation as shown in the Example Output below.**
### Example Output:
- **PRECISION_Reasoning**:
Passage 1: ... [Provide detailed analysis and reasoning for the score] ... Score: 4
Passage 2: ... [Provide detailed analysis and reasoning for the score] ... Score: 3
Passage 3: ... [Provide detailed analysis and reasoning for the score] ... Score: 4
- **PRECISION_Formula**: (4 + 3 + 4) / 3 = 3.67
- **PRECISION_Score**: 3.67
USER_QUERY: {query}
LLM_RETURNED_PASSAGE: {passage_llm}Evaluation Results
1/22/2026
Overall Score
2.06/5
Average across all 3 models
Best Performing Model
Low Confidence
anthropic:claude-3-haiku
5.00/5
anthropic:claude-3-haiku
#1 Ranked
5.00
/5.00
adh
5.0
cla
5.0
com
5.0
GPT-4o Mini
#2 Ranked
0.77
/5.00
adh
0.8
cla
0.8
com
0.7
google:gemini-1.5-flash
#3 Ranked
0.40
/5.00
adh
0.4
cla
0.4
com
0.4
Test Case:
Tags
langsmith
john-chatly
StructuredPrompt
