Prompt performance,
not guesswork

Every prompt tested across the same models. Scored by independent AI judges.

Evaluating across GPT-4, Claude 3.5, and Gemini 1.5

Sort By

All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).

How it works →

13 prompts found

bytes_to_megabytes

Best Modelclaude-3-5-haiku

View details →

python_repl

Best Modelgemini-2.5-flash-lite

View details →

transform_to_mongo_pipeline

Best Modelclaude-3-5-haiku

View details →

context-supervisor

Best Modelgpt-5-mini

View details →

convert_to_utc

Best Modelclaude-3-5-haiku

View details →

list_column

Best Modelclaude-3-5-haiku

View details →

network_sys_prompt

Best Modelgemini-2.5-flash-lite

View details →

understand_network_data

Best Modelgemini-2.5-flash-lite

View details →

upload_to_s3

Best Modelclaude-3-5-haiku

View details →

network_pandas_prompt

Best Modelgemini-2.5-flash-lite

View details →

execute_mongo_pipeline

Best Modelclaude-3-5-haiku

View details →

retrieve_from_kb

Best Modelgemini-2.5-flash-lite

View details →

network_filter_prompt

Best Modelgemini-2.5-flash-lite

View details →

You've reached the end