These are evaluators we have custom built for specific customers. We are working on integrating them into our Github repository.

If you would like to use one of these, please contact us at, or sign up to stay notified as we release new evals.

API Call

Calls an external API - your custom evals plugged into Athina.

Language Mismatch

Engine: gpt-3.5-turbo

Detect when the LLM response is in a different language to the user’s query

Sensitive Data Leak

Engine: gpt-3.5-turbo

Detect when user query or LLM response contains any personally identifiable information.

Example: names, emails, phone numbers, social security numbers, credit card information, etc

You can configure evals for different types of PII leak.

Eval Explanation: Content Moderation


Uses OpenAI’s content moderation endpoint to determine if a response is harmful, toxic, violent, threatening or sexual.

Eval thresholds can be configured.

Prompt Injection Attacks

[Coming Soon]

Common Mistakes

[Coming Soon]

Restricted Keywords

String Match

Detect when your LLM output contains certain kinds of keywords

Eval: Critical Keywords

String Match

Detect when your LLM output is missing critical keywords

Regex HTTP

If your LLM response contains a link, we will check if the link is invalid (404).

No LLM, just good old regex + HTTP request.

Hallucinated Email

Engine: gpt-3.5-turbo | Regex

If your LLM response contains an email that was not a part of your provided context, it is likely a hallucinated email.