Athina home page
Search...
⌘K
Ask AI
AI Hub
Website
Sign-up
Sign-up
Search...
Navigation
Preset Evals
Preset Evals
Docs
API / SDK Reference
Guides
FAQs
Documentation
Open-Source Evals
Blog
Email us
Book a call
Logging
Overview
POST
Logging Attributes
Logging LLM Inferences
Updating Logs
Datasets
List All Datasets
Create a Dataset
Add Rows to Dataset
Get Dataset
Delete Dataset
Update Cells in a Dataset
Evals
Running Evals via SDK
Loading Data for Evals
Preset Evals
Overview
RAG Evals
Safety
JSON Evals
Summarization QA
Function Evals
Grounded Evals
Conversation Evaluators
Custom Evals
GraphQL API
Overview
Getting Started
Sample GraphQL Queries
Curl and Python Examples
Deprecated
OpenAI Completion 0.x
OpenAI Completion 1.x
On this page
RAG Evals
RAGAS Evals
Safety Evals
Summarization Evals
JSON Evals
Function Evals
Evals with Ground Truth
Preset Evals
Preset Evals
Preset evaluators are a set of common turnkey evaluators that you can use to evaluate your LLM applications.
You can also create custom evaluators. See
here
for more information.
RAG Evals
These evals are useful for evaluating LLM applications with Retrieval Augmented Generation (RAG):
Context Contains Enough Information
Does Response Answer Query
Response Faithfulness
Groundedness
RAGAS Evals
RAGAS
is a popular library with state-of-the-art evaluation metrics for RAG models:
Context Precision
Context Relevancy
Context Recall
Faithfulness
Answer Relevancy
Answer Semantic Similarity
Answer Correctness
Coherence
Conciseness
Maliciousness
Harmfulness
Safety Evals
These evals are useful for evaluating LLM applications with safety in mind:
PII Detection
: Will fail if PII is found in the text
Prompt Injection
: Will fail if any known Prompt Injection attack is found in the text
OpenAI Content Moderation
: Will fail if text is potentially harmful
Guardrails
: A popular library for custom validators for LLM applications:
Safe for work
: Checks if text has inappropriate/NSFW content
Not gibberish
: Checks if response contains gibberish
Contains no sensitive topics
: Checks for sensitive topics
Summarization Evals
These evals are useful for evaluating LLM-powered summarization performance:
Summarization Accuracy
JSON Evals
These evals are useful for validating JSON outputs:
JSON Schema Validation
JSON Field Validation
Function Evals
Unlike the previous evaluators which used an LLM for grading, function evals use simple functions to check if:
Text matches a given
regular expression
Text
contains a link
Text
contains keywords
Text
contains no invalid links
Text is
missing keywords
Head over to the
function evaluators
page for further details.
Evals with Ground Truth
These evaluators compare the response against reference data:
Answer Similarity
Context Similarity
Head over to the
grounded evaluators
page for further details.
Loading data via Llama-Index
Context Sufficiency
Assistant
Responses are generated using AI and may contain mistakes.