Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
StructuredPrompt
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
13 prompts found
Extraction
rag-answer-hallucination
Best Modelgpt-5-mini
Overall3.4
Winner3.9
0
View details →
Extraction
rag-document-relevance
Best Modelgemini-2.5-flash-lite
Overall3.2
Winner3.6
0
View details →
Extraction
rag-answer-helpfulness
Best Modelgpt-5-mini
Overall3.1
Winner3.8
0
View details →
Extraction
rag-answer-hallucination
Best Modelclaude-3-5-haiku
Overall2.9
Winner3.9
0
View details →
Extraction
rag-context-precision
Best Modelgpt-5-mini
Overall2.8
Winner3.1
0
View details →
Extraction
rag-answer-hallucination
Best Modelgemini-2.5-flash-lite
Overall2.7
Winner3.7
0
View details →
Extraction
pairwise-evaluation-2
Best Modelclaude-3-5-haiku
Overall2.7
Winner3.1
0
View details →
Extraction
rag-doc-relevance
Best Modelgemini-2.5-flash-lite
Overall2.5
Winner4.2
0
View details →
Extraction
rag-answer-vs-helpfullness
Best Modelclaude-3-5-haiku
Overall2.4
Winner3.2
0
View details →
Extraction
evaluator-rag-precision
Best Modelclaude-3-5-haiku
Overall2.3
Winner2.5
0
View details →
Extraction
mycelium_relevance
Best Modelgpt-5-mini
Overall2.2
Winner2.6
0
View details →
Extraction
rag-answer-vs-reference
Best Modelgemini-2.5-flash-lite
Overall2.2
Winner2.4
0
View details →
Extraction
test_roche
Best Modelclaude-3-5-haiku
Overall2.1
Winner2.7
0
View details →
You've reached the end
