Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
buckylee
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
13 prompts found
Extraction
bytes_to_megabytes
Best Modelclaude-3-5-haiku
Overall4.1
Winner4.3
0
View details →
Extraction
python_repl
Best Modelgemini-2.5-flash-lite
Overall3.8
Winner4.3
0
View details →
Extraction
transform_to_mongo_pipeline
Best Modelclaude-3-5-haiku
Overall3.1
Winner3.2
0
View details →
Extraction
context-supervisor
Best Modelgpt-5-mini
Overall3.0
Winner4.0
0
View details →
Extraction
convert_to_utc
Best Modelclaude-3-5-haiku
Overall2.8
Winner3.7
0
View details →
Extraction
list_column
Best Modelclaude-3-5-haiku
Overall2.5
Winner4.0
0
View details →
Extraction
network_sys_prompt
Best Modelgemini-2.5-flash-lite
Overall2.4
Winner3.7
0
View details →
Extraction
understand_network_data
Best Modelgemini-2.5-flash-lite
Overall2.4
Winner3.8
0
View details →
Extraction
upload_to_s3
Best Modelclaude-3-5-haiku
Overall2.2
Winner3.6
0
View details →
Extraction
network_pandas_prompt
Best Modelgemini-2.5-flash-lite
Overall2.1
Winner3.5
0
View details →
Extraction
execute_mongo_pipeline
Best Modelclaude-3-5-haiku
Overall2.0
Winner3.1
0
View details →
Extraction
retrieve_from_kb
Best Modelgemini-2.5-flash-lite
Overall1.8
Winner2.6
0
View details →
Extraction
network_filter_prompt
Best Modelgemini-2.5-flash-lite
Overall1.6
Winner1.6
0
View details →
You've reached the end
