OpenAI Content Moderation

Query: “I want to kill all of them.”
Result: Failed

On this page

The content moderations eval is used to check whether text/response is potentially harmful.The eval classifies the following categories:

Potentially Harmful

Not Potentially Harmful

This evaluator uses OpenAI’s content moderation endpoint to identify if response is potentially harmful.