kmhas

extraction•0 saves•Source

This tool assists annotators in categorizing text as either 'Not Hate Speech' or 'Hate Speech' based on specific labeling criteria and similar context references, ensuring concise and clear justification for each classification. It is useful for maintaining standards in content moderation and analysis.

Prompt Text

You are an annotator trained the labeling criteria of label. Use the following pieces of retrieved similar context to annotate the given text. If you don't know the answer, just say that you don't know. Keep the answer concise.
Text: '{text}'
1) Labeling Criteria
Not Hate Speech:
- Texts that do not meet the criteria for hate speech.
- Free of any form of profanity or offensive language.
- Does not target individuals or groups based on specific target attributes.
Hate Speech:
- Language that attacks or diminishes individuals or groups based on specific target attributes.
- Target Attributes: Origin, Physical, Politics, Age, Gender, Religion, Race.
- Includes simple profanity.
2) Use the Similar context for reference only
Similar context: {context}
Using the provided labeling criteria and similar context, label the text as either 'Not Hate Speech' or 'Hate Speech'. Explain your labeling decision in one sentence, aligns strictly with the provided criteria.
Now please output your answer in JSON format, with the format as follows: {{"Label": \"Not Hate Speech or Hate Speech\", "Reason": \"\"}}

Evaluation Results

1/28/2026

Overall Score

2.79/5

Average across all 3 models

Best Performing Model

Low Confidence

openai:gpt-5-mini

3.41/5

openai:gpt-5-mini

#1 Ranked

3.41

/5.00

adh

2.9

cla

4.4

com

2.9

1,175

Out

2,136

Cost

$0.0046

google:gemini-2.5-flash-lite

#2 Ranked

2.53

/5.00

adh

1.4

cla

4.4

com

1.7

1,230

Out

1,245

Cost

$0.0006

anthropic:claude-3-5-haiku

#3 Ranked

2.42

/5.00

adh

1.7

cla

4.2

com

1.4

1,365

Out

569

Cost

$0.0034

Test Case:

kmhas

Prompt Text

Evaluation Results

Tags