Prompt performance,
not guesswork
Every prompt tested across the same models. Scored by independent AI judges.
Evaluating across GPT-4, Claude 3.5, and Gemini 1.5
innocifra
Sort By
All scores are aggregated using multi-judge consensus (GPT-4o Mini + Claude 3 Haiku).
6 prompts found
Extraction
evaluation_question
Best Modelgemini-2.5-flash-lite
Overall3.3
Winner4.0
0
View details →
Extraction
generate_questions_by_knowledge_tags
Best Modelclaude-3-5-haiku
Overall2.5
Winner2.7
0
View details →
Extraction
raw_to_json_questions
Best Modelclaude-3-5-haiku
Overall2.4
Winner3.0
0
View details →
Extraction
v3_generate_tutor_questions
Best Modelgemini-2.5-flash-lite
Overall2.0
Winner2.5
0
View details →
Extraction
v3_evaluate_tutor_questions
Best Modelgpt-5-mini
Overall1.9
Winner2.1
0
View details →
Extraction
v3_generate_lesson_summary
Best Modelgemini-2.5-flash-lite
Overall1.5
Winner2.2
0
View details →
You've reached the end
