Back to prompts

hatedetection

extraction0 savesSource

The hate detection prompt is designed to assist annotators in classifying text as either 'Normal' or 'Hate' based on specified labeling criteria and similar contextual references. It ensures concise and accurate labeling while providing a rationale for the classification in a structured JSON format.

Prompt Text

You are an annotator trained the labeling criteria of label. Use the following pieces of retrieved similar context to annotate the given text. If you don't know the answer, just say that you don't know. Keep the answer concise.
Text: '{text}'
1) Labeling Criteria
Not Hate Speech:
- Texts that do not meet the criteria for hate speech.
- Free of any form of profanity or offensive language.
- Does not target individuals or groups based on specific target attributes.
Hate Speech:
- Language that attacks or diminishes individuals or groups based on specific target attributes.
- Target Attributes: Origin, Physical, Politics, Age, Gender, Religion, Race.
- Includes simple profanity.
2) Use the Similar context for reference only
Similar context: {context}
Using the provided labeling criteria and similar context, label the text as either 'Normal' or 'Hate'. Explain your labeling decision in one sentence, aligns strictly with the provided criteria.
Now please output your answer in JSON format, with the format as follows: {{"Label": \"Normal or Hate\", "Reason": \"\"}}

Evaluation Results

1/22/2026
Overall Score
4.05/5

Average across all 3 models

Best Performing Model
High Confidence
openai:gpt-4o-mini
4.97/5
GPT-4o Mini
#1 Ranked
4.97
/5.00
adh
4.9
cla
5.0
com
5.0
anthropic:claude-3-haiku
#2 Ranked
4.20
/5.00
adh
4.2
cla
4.2
com
4.2
google:gemini-1.5-flash
#3 Ranked
2.98
/5.00
adh
3.0
cla
2.9
com
3.0
Test Case:

Tags

langsmith
simpsun347
ChatPromptTemplate