Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
wfh
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
12 prompts found
Extraction
tweet-critic-fewshot
Best Modelclaude-3-5-haiku
Overall3.8
Winner4.1
0
View details →
Classification
tnt-llm-taxonomy-update
Best Modelgemini-2.5-flash-lite
Overall3.0
Winner4.3
0
View details →
Extraction
llm-compiler-joiner
Best Modelgpt-5-mini
Overall2.9
Winner4.2
0
View details →
Classification
tnt-llm-taxonomy-generation
Best Modelclaude-3-5-haiku
Overall2.8
Winner3.0
0
View details →
Classification
tnt-llm-classify
Best Modelgpt-5-mini
Overall2.8
Winner3.5
0
View details →
Classification
tnt-llm-taxonomy-review
Best Modelgemini-2.5-flash-lite
Overall2.7
Winner4.3
0
View details →
Summarization
tnt-llm-summary-generation
Best Modelclaude-3-5-haiku
Overall2.3
Winner2.3
0
View details →
Extraction
llm-compiler
Best Modelgpt-5-mini
Overall2.2
Winner2.4
0
View details →
Extraction
langsmith-agent-prompt
Best Modelgpt-5-mini
Overall2.0
Winner2.3
0
View details →
Extraction
proposal-indexing
Best Modelgpt-5-mini
Overall1.8
Winner1.9
0
View details →
Extraction
react-agent-executor
Best Modelgpt-5-mini
Overall1.8
Winner2.4
0
View details →
Extraction
web-voyager
Best Modelgemini-2.5-flash-lite
Overall1.7
Winner2.1
0
View details →
You've reached the end
