kold

classification•0 saves•Source

The prompt guides annotators to classify a given text as either 'Not Offensive' or 'Offensive' based on specific labeling criteria, utilizing similar context for reference. It ensures concise and clear reasoning for each classification decision, formatted in JSON for structured output.

Prompt Text

You are an annotator trained the labeling criteria of label. Use the following pieces of retrieved similar context to annotate the given text. If you don't know the answer, just say that you don't know. Keep the answer concise.
Text: '{text}'
1) Labeling Criteria
Not Offensive:
- Texts that do not meet the offensive criteria.
- Free of untargeted profanity and targeted offenses like insults and threats, which can be implicit or explicit.
Offensive:
- Contains untargeted profanity(offensive remarks) or targeted offenses such as insults and threats.
- Target Type: UNT (Untargeted), IND (Individual), GRP (Group), OTH (Others).
- Target Group Attribute: Gender & Sexual Orientation, Race, Ethnicity & Nationality, Political Affiliation, Religion, Miscellaneous.
- Hate Speech: Classify as hate speech only if offensive towards Target Group.
2) Use the Similar context for reference only
Similar context: {context}
Using the provided labeling criteria and similar context, label the text as either 'Not Offensive' or 'Offensive'. Explain your labeling decision in one sentence, aligns strictly with the provided criteria.
Now please output your answer in JSON format, with the format as follows: {{"Label": \"Not Offensive or Offensive\", "Reason": \"\"}}

Evaluation Results

1/22/2026

Overall Score

4.67/5

Average across all 3 models

Best Performing Model

Low Confidence

openai:gpt-4o-mini

5.00/5

GPT-4o Mini

#1 Ranked

5.00

/5.00

adh

5.0

cla

5.0

com

5.0

anthropic:claude-3-haiku

#2 Ranked

4.93

/5.00

adh

4.8

cla

4.9

com

5.0

google:gemini-1.5-flash

#3 Ranked

4.07

/5.00

adh

3.7

cla

4.6

com

4.0

Test Case:

kold

Prompt Text

Evaluation Results

Tags