Prompt performance,
not guesswork

Every prompt tested across the same models. Scored by independent AI judges.

Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
github
Sort By

All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).

How it works →

21 prompts found

Extraction

City Extractor (Few-Shot)

Best Modelgemini-2.5-flash-lite
Overall4.9
Winner4.9
0
View details →
Extraction

Capital City Extractor

Best Modelgemini-2.5-flash-lite
Overall4.7
Winner4.8
0
View details →
Summarization

Legal Document Summarizer

Best Modelgemini-2.5-flash-lite
Overall4.6
Winner4.9
0
View details →
Extraction

RAG Query Answering

Best Modelgpt-5-mini
Overall4.0
Winner4.1
0
View details →
Classification

Social Media Comment Moderator

Best Modelgemini-2.5-flash-lite
Overall3.5
Winner4.9
0
View details →
Summarization

Guided Legal Summary Generator

Best Modelgemini-2.5-flash-lite
Overall3.4
Winner4.8
0
View details →
Summarization

Investment Memo Editor

Best Modelgpt-5-mini
Overall3.1
Winner4.0
0
View details →
Summarization

Portfolio Manager Investment Memo

Best Modelclaude-3-5-haiku
Overall3.1
Winner3.7
0
View details →
Extraction

Citation Extraction Agent

Best Modelclaude-3-5-haiku
Overall2.9
Winner3.7
0
View details →
Extraction

Recipe Ingredient Extractor

Best Modelgpt-5-mini
Overall2.9
Winner3.4
0
View details →
Extraction

Shopping List Organizer

Best Modelclaude-3-5-haiku
Overall2.8
Winner3.2
0
View details →
Classification

Customer Support Ticket Classifier

Best Modelgpt-5-mini
Overall2.7
Winner3.3
0
View details →
Summarization

Long Document Sublease Summarizer

Best Modelgemini-2.5-flash-lite
Overall2.7
Winner4.4
0
View details →
Classification

Essay Grading Evaluator

Best Modelgpt-5-mini
Overall2.3
Winner2.4
0
View details →
Extraction

Text-to-SQL with Chain-of-Thought

Best Modelclaude-3-5-haiku
Overall2.3
Winner2.6
0
View details →
Extraction

Text-to-SQL Converter

Best Modelgpt-5-mini
Overall2.3
Winner3.4
0
View details →
Extraction

Text-to-SQL with Few-Shot Examples

Best Modelgpt-5-mini
Overall2.2
Winner2.9
0
View details →
Summarization

Macro Strategy Report

Best Modelgemini-2.5-flash-lite
Overall1.9
Winner3.0
0
View details →
Classification

LLM Output Quality Judge

Best Modelgemini-2.5-flash-lite
Overall1.8
Winner3.2
0
View details →
Extraction

Quantitative Analysis Report

Best Modelgemini-2.5-flash-lite
Overall1.8
Winner2.9
0
View details →
Extraction

Fundamental Analysis Report

Best Modelgemini-2.5-flash-lite
Overall1.8
Winner2.8
0
View details →
You've reached the end