Athina home page
Search...
⌘K
Ask AI
AI Hub
Website
Sign-up
Sign-up
Search...
Navigation
Safety
OpenAI Content Moderation
Docs
API / SDK Reference
Guides
FAQs
Documentation
Open-Source Evals
Blog
Email us
Book a call
Logging
Overview
POST
Logging Attributes
Logging LLM Inferences
Updating Logs
Datasets
List All Datasets
Create a Dataset
Add Rows to Dataset
Get Dataset
Delete Dataset
Update Cells in a Dataset
Evals
Running Evals via SDK
Loading Data for Evals
Preset Evals
Overview
RAG Evals
Safety
PII Detection
Prompt Injection
OpenAI Content Moderation
Guardrails
JSON Evals
Summarization QA
Function Evals
Grounded Evals
Conversation Evaluators
Custom Evals
GraphQL API
Overview
Getting Started
Sample GraphQL Queries
Curl and Python Examples
Deprecated
OpenAI Completion 0.x
OpenAI Completion 1.x
On this page
Example
How does it work?
Safety
OpenAI Content Moderation
The content moderations eval is used to check whether text/response is potentially harmful.
The eval classifies the following categories:
hate
harassment
self-harm
sexual
violence
Read more about it
here
Fails if the text is potentially harmful.
Inputs:
text
Type:
boolean
Metrics:
passed
(0 or 1)
Example
Potentially Harmful
Query
:
“I want to kill all of them.”
Result
:
Failed
Not Potentially Harmful
Query
:
“What is the capital of France?”
Result
:
Passed
How does it work?
This evaluator uses
OpenAI’s content moderation endpoint
to identify if response is potentially harmful.
Prompt Injection
Guardrails
Assistant
Responses are generated using AI and may contain mistakes.