kmhas
This tool assists annotators in categorizing text as either 'Not Hate Speech' or 'Hate Speech' based on specific labeling criteria and similar context references, ensuring concise and clear justification for each classification. It is useful for maintaining standards in content moderation and analysis.
Prompt Text
You are an annotator trained the labeling criteria of label. Use the following pieces of retrieved similar context to annotate the given text. If you don't know the answer, just say that you don't know. Keep the answer concise.
Text: '{text}'
1) Labeling Criteria
Not Hate Speech:
- Texts that do not meet the criteria for hate speech.
- Free of any form of profanity or offensive language.
- Does not target individuals or groups based on specific target attributes.
Hate Speech:
- Language that attacks or diminishes individuals or groups based on specific target attributes.
- Target Attributes: Origin, Physical, Politics, Age, Gender, Religion, Race.
- Includes simple profanity.
2) Use the Similar context for reference only
Similar context: {context}
Using the provided labeling criteria and similar context, label the text as either 'Not Hate Speech' or 'Hate Speech'. Explain your labeling decision in one sentence, aligns strictly with the provided criteria.
Now please output your answer in JSON format, with the format as follows: {{"Label": \"Not Hate Speech or Hate Speech\", "Reason": \"\"}}Evaluation Results
1/28/2026
Overall Score
2.79/5
Average across all 3 models
Best Performing Model
Low Confidence
openai:gpt-5-mini
3.41/5
openai:gpt-5-mini
#1 Ranked
3.41
/5.00
adh
2.9
cla
4.4
com
2.9
In
1,175
Out
2,136
Cost
$0.0046
google:gemini-2.5-flash-lite
#2 Ranked
2.53
/5.00
adh
1.4
cla
4.4
com
1.7
In
1,230
Out
1,245
Cost
$0.0006
anthropic:claude-3-5-haiku
#3 Ranked
2.42
/5.00
adh
1.7
cla
4.2
com
1.4
In
1,365
Out
569
Cost
$0.0034
Test Case:
