Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
hwchase17
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
14 prompts found
Extraction
react-chat
Best Modelgpt-5-mini
Overall3.3
Winner3.6
0
View details →
Extraction
react-chat-json
Best Modelgpt-5-mini
Overall2.6
Winner3.0
0
View details →
Extraction
self-discovery-select
Best Modelgemini-2.5-flash-lite
Overall2.5
Winner3.1
0
View details →
Extraction
self-discovery-structure
Best Modelgemini-2.5-flash-lite
Overall2.4
Winner4.1
0
View details →
Extraction
xml-agent-convo
Best Modelgpt-5-mini
Overall2.3
Winner2.4
0
View details →
Extraction
self-discovery-adapt
Best Modelgemini-2.5-flash-lite
Overall2.2
Winner3.8
0
View details →
Extraction
self-discovery-reasoning
Best Modelclaude-3-5-haiku
Overall2.2
Winner2.8
0
View details →
Extraction
react-multi-input-json
Best Modelclaude-3-5-haiku
Overall2.1
Winner2.4
0
View details →
Extraction
multi-query-retriever
Best Modelclaude-3-5-haiku
Overall1.9
Winner2.0
0
View details →
Extraction
react
Best Modelclaude-3-5-haiku
Overall1.9
Winner2.1
0
View details →
Extraction
openai-tools-agent
Best Modelgpt-5-mini
Overall1.9
Winner2.6
0
View details →
Extraction
self-ask-with-search
Best Modelgpt-5-mini
Overall1.8
Winner2.5
0
View details →
Extraction
react-json
Best Modelgemini-2.5-flash-lite
Overall1.0
Winner1.5
0
View details →
Extraction
structured-chat-agent
Overall0.0
Winner0.0
0
View details →
You've reached the end
