Athina home page
Search...
⌘K
Ask AI
AI Hub
Website
Sign-up
Sign-up
Search...
Navigation
Evals
Preset Evaluators
Docs
API / SDK Reference
Guides
FAQs
Documentation
Open-Source Evals
Blog
Email us
Book a call
Getting Started
Introduction
Getting Started
Datasets
Introduction
Creating a dataset
Dynamic Columns
Run evaluations (UI)
View Metrics
Run Experiments
Compare datasets
Join Datasets
Export / Download Datasets
SQL Notebook
Automations
Manage Datasets
Delete a Dataset
Evals
Overview
Quick Start
Online Evals
Offline Evals
Preset Evaluators
Custom Evals
Running Evals in UI
Running Evals via SDK
Running Evals in CI/CD
Why Athina Evals?
Cookbooks
Flows
Overview
Concepts
Variables
Sharing Flows
Flow Templates
Blocks
Annotation
Overview
Metrics
Configure Project
View Data
Review Entries
Export Data
Permissions
Prompts
Overview
Concepts
Prompt Syntax
Create Prompt Template
Prompt Versioning
Delete Prompt Slug
List Prompt Slugs
Duplicate Prompt Slug
Run Prompt
Multi-Prompt Playground
Run Evaluations on a Prompt Response
Organize Prompts
Monitoring
Overview
Inference Trace
Analytics
Topic Classification
Export Data from Athina
Continuous Evaluation
Model Performance Metrics
Settings
Custom Models
Sampling Evals
Credits
Integrations
Integrations
Self Hosting
Self-Hosting
Self-Hosting On Azure
Datasets
Import a HuggingFace Dataset
On this page
Available Preset Evaluators
RAG Evals
RAGAS Evals
Safety Evals
Summarization Evals
JSON Evals
Function Evals
Evals with Ground Truth
Evals
Preset Evaluators
Athina has a large library of preset evaluators to cover all kinds of common use cases.
View the evaluators in the
Athina IDE
.
View the evaluators on Github in
Athina’s Open-Source Evaluation SDK
.
Available Preset Evaluators
You can also create custom evaluators. See
here
for more information.
RAG Evals
These evals are useful for evaluating LLM applications with Retrieval Augmented Generation (RAG):
Context Contains Enough Information
Does Response Answer Query
Response Faithfulness
Groundedness
RAGAS Evals
RAGAS
is a popular library with state-of-the-art evaluation metrics for RAG models:
Context Precision
Context Relevancy
Context Recall
Faithfulness
Answer Relevancy
Answer Semantic Similarity
Answer Correctness
Coherence
Conciseness
Maliciousness
Harmfulness
Safety Evals
These evals are useful for evaluating LLM applications with safety in mind:
PII Detection
: Will fail if PII is found in the text
Prompt Injection
: Will fail if any known Prompt Injection attack is found in the text
OpenAI Content Moderation
: Will fail if text is potentially harmful
Guardrails
: A popular library for custom validators for LLM applications:
Safe for work
: Checks if text has inappropriate/NSFW content
Not gibberish
: Checks if response contains gibberish
Contains no sensitive topics
: Checks for sensitive topics
Summarization Evals
These evals are useful for evaluating LLM-powered summarization performance:
Summarization Accuracy
JSON Evals
These evals are useful for validating JSON outputs:
JSON Schema Validation
JSON Field Validation
Function Evals
Unlike the previous evaluators which used an LLM for grading, function evals use simple functions to check if:
Text matches a given
regular expression
Text
contains a link
Text
contains keywords
Text
contains no invalid links
Text is
missing keywords
Head over to the
function evaluators
page for further details.
Evals with Ground Truth
These evaluators compare the response against reference data:
Answer Similarity
Context Similarity
Head over to the
grounded evaluators
page for further details.
Offline Evals
Custom Evals
Assistant
Responses are generated using AI and may contain mistakes.