Prompt performance,
not guesswork

Every prompt tested across the same models. Scored by independent AI judges.

Evaluating across GPT-4, Claude 3.5, and Gemini 1.5

Sort By

All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).

How it works →

12 prompts found

tweet-critic-fewshot

Best Modelclaude-3-5-haiku

View details →

tnt-llm-taxonomy-update

Best Modelgemini-2.5-flash-lite

View details →

llm-compiler-joiner

Best Modelgpt-5-mini

View details →

tnt-llm-taxonomy-generation

Best Modelclaude-3-5-haiku

View details →

tnt-llm-classify

Best Modelgpt-5-mini

View details →

tnt-llm-taxonomy-review

Best Modelgemini-2.5-flash-lite

View details →

tnt-llm-summary-generation

Best Modelclaude-3-5-haiku

View details →

llm-compiler

Best Modelgpt-5-mini

View details →

langsmith-agent-prompt

Best Modelgpt-5-mini

View details →

proposal-indexing

Best Modelgpt-5-mini

View details →

react-agent-executor

Best Modelgpt-5-mini

View details →

web-voyager

Best Modelgemini-2.5-flash-lite

View details →

You've reached the end