Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
efriis
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
5 prompts found
Extraction
self-rag-answer-grader
Best Modelgemini-2.5-flash-lite
Overall3.8
Winner4.8
0
View details →
Extraction
self-rag-retrieval-grader
Best Modelgemini-2.5-flash-lite
Overall2.9
Winner4.5
0
View details →
Extraction
self-rag-hallucination-grader
Best Modelgemini-2.5-flash-lite
Overall2.8
Winner3.7
0
View details →
Extraction
my-first-prompt
Best Modelgpt-5-mini
Overall2.7
Winner3.2
0
View details →
Extraction
self-rag-question-rewriter
Best Modelgemini-2.5-flash-lite
Overall2.2
Winner2.5
0
View details →
You've reached the end
