Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
ciudadela
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
12 prompts found
Extraction
triage_evaluate_community_match
Best Modelgpt-5-mini
Overall3.3
Winner4.8
0
View details →
Extraction
account_from_ocr
Best Modelgpt-5-mini
Overall3.1
Winner4.4
0
View details →
Extraction
triage_assignee_rules
Best Modelgpt-5-mini
Overall2.9
Winner4.9
0
View details →
Extraction
meeting-agent
Best Modelgemini-2.5-flash-lite
Overall2.8
Winner4.7
0
View details →
Summarization
triage_classify
Best Modelgpt-5-mini
Overall2.8
Winner4.2
0
View details →
Summarization
invoices_from_ocr
Best Modelclaude-3-5-haiku
Overall2.6
Winner2.9
0
View details →
Extraction
triage_infer_community
Best Modelgpt-5-mini
Overall2.5
Winner4.5
0
View details →
Classification
triage_final_result
Best Modelgpt-5-mini
Overall2.4
Winner4.2
0
View details →
Extraction
best_account_for_journal_entry
Best Modelgemini-2.5-flash-lite
Overall2.3
Winner2.8
0
View details →
Extraction
provider-services-description
Best Modelgemini-2.5-flash-lite
Overall2.1
Winner4.0
0
View details →
Extraction
ciudadela-lyra-v0_querier
Best Modelclaude-3-5-haiku
Overall1.6
Winner2.5
0
View details →
Extraction
ocr_generator
Best Modelclaude-3-5-haiku
Overall1.4
Winner1.5
0
View details →
You've reached the end
