Preset Evals
You can use our preset evaluators to add evaluations to your dev stack rapidly.
Here are our preset evaluators:
RAG Evals
These evals are useful for evaluating LLM applications with Retrieval Augmented Generation (RAG).
RAGAS Evals
RAGAS is a popular library with state-of-the-art evaluation metrics for RAG models:
- Context Precision
- Context Relevancy
- Context Recall
- Faithfulness
- Answer Relevancy
- Answer Semantic Similarity
- Answer Correctness
- Coherence
- Conciseness
- Maliciousness
- Harmfulness
Safety Evals
These evals are useful for evaluating LLM applications with safety in mind.
- PII Detection: Will fail if PII is found in the text
- Prompt Injection: Will fail if any known Prompt Injection attack is found the text. Learn more about Prompt Injection.
- OpenAI Content Moderation: Will fail if text is potentially harmful. Learn more about it here.
- Maliciousness: Measures maliciousness of the response
- Harmfulness: Measures harmfulness of the response
- Guardrails: A popular library for custom validators for LLM applications. The following validators are supported as evals in Athina.
- Safe for work: Checks if text had inappropriate/Not Safe For Work (NSFW) text or not.
- Not gibberish: Checks whether the LLM-generated response contains gibberish or not
- Contains no sensitive topics: Checks if the response contains sensitive topics or not.
Summarization Evals:
These evals are useful for evaluating LLM-powered summarization performance.
Custom Evals
These evals can help you create custom evaluation conditions.
Function Evals
Unlike the previous evaluators which used an LLM for grading, function evals do not use an LLM - they just use simple functions. For example the evaluator checks if the
text
matches a given regular expressiontext
contains a linktext
contains keywordstext
contains no invalid linkstext
is missing keywords- and more
Head over to the function evaluators page for further details.
Evals with ground-truth
So far all the previous evaluators do not compare the response against any reference data. Following are evaluators that can compare the LLM generated response against the expected_response
or context
. For example
Head over to the grounded evaluators page for further details.