Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
openai:gpt-4
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
11 prompts found
Summarization
simple-rag
Best Modelclaude-3-5-haiku
Overall3.4
Winner3.8
0
View details →
Summarization
pre-top-3-summarization
Best Modelgpt-5-mini
Overall3.3
Winner3.5
0
View details →
Extraction
sport-routine-to-program-short
Best Modelgpt-5-mini
Overall3.2
Winner5.0
0
View details →
Summarization
sport-routine-to-program
Best Modelgpt-5-mini
Overall2.9
Winner3.7
0
View details →
Extraction
assumption-checker
Best Modelclaude-3-5-haiku
Overall2.8
Winner4.7
0
View details →
Summarization
chain-of-density-prompt
Best Modelgpt-5-mini
Overall2.5
Winner2.7
0
View details →
Summarization
librarian_guide
Best Modelclaude-3-5-haiku
Overall2.0
Winner2.4
0
View details →
Extraction
generate_politicans
Best Modelclaude-3-5-haiku
Overall2.0
Winner2.5
0
View details →
Extraction
multi-query-retriever
Best Modelclaude-3-5-haiku
Overall1.9
Winner2.0
0
View details →
Extraction
proposal-indexing
Best Modelgpt-5-mini
Overall1.8
Winner1.9
0
View details →
Extraction
conversation-title-generator
Best Modelgpt-5-mini
Overall1.6
Winner1.9
0
View details →
You've reached the end
