Athina Evals - Athina

On this page

Quick Start Guides
Preset Evaluators
Hallucinations
Safety
RAG
Grounded evals
Conversation evals
Custom evals
Function evals
Write your own eval
We can build evals for you

Quick Start Guides

Running evals using Athina SDK

Run 40+ preset evals or your own custom evals in just a few lines of code using our python SDK.

Running evals on Athina Platform

Run 40+ preset evals or your own custom evals on any dataset.

Comparing different models and prompts using Athina

Compare retrievals and responses from different datasets, and run evaluations on both datasets.

Continuous evaluation in production

Configure evaluations to run continuously on production logs to measure quality, and detect hallucinations.

Setting up evals in CI / CD

Run evals in your CI / CD pipeline to prevent regressions, and ensure that bad prompts / models don’t get to production.

Real-time guardrailing using Athina Guard

Detect bad inputs and outputs in real-time.

Preset Evaluators

Athina has a large library of preset evaluators to cover all kinds of common use cases. However, evals are not one-size-fits-all. Which is why Athina supports many ways to use custom evals, or even create your own. Schedule a call with us, and we’ll set up your evaluation and safety suite for you.

Hallucinations

Detect hallucinations in RAG apps

Detect hallucinations and measure quality of RAG apps in a few minutes.

Detect hallucinations in LLM generated summaries

Detect hallucinations and measure accuracy of LLM-generated summaries.

Safety

Prompt Injection

Fails if Prompt Injection attack is found in the text.

Detect PII

Detect common Prompt Injection attacks.

OpenAI Content Moderation

Detect common Prompt Injection attacks.

Guardrails

Detect common Prompt Injection attacks.

RAG

Measure retrieval and response quality in RAG apps

Use a suite of Ragas + Athina evals to measure the quality of your retrievals and responses for RAG apps.

Ragas

Ragas is a popular open-source library with state-of-the-art evals for RAG use cases.

Context Sufficiency

Checks if the retrieved context contains enough information to answer the query.

Answer Completeness

Checks if the LLM response completely answers the query.

Faithfulness

Checks if the LLM response was faithful to the provided context

Groundedness

Checks the LLM response sentence-by-sentence to find evidence of each sentence in the provided context.

Grounded evals

If you have ground truth data, you may use these evals

Grounded Evals

These evaluators look at the entire chat instead of just a single message.

Conversation evals

Conversational Evaluators

These evaluators look at the entire chat instead of just a single message.

Custom evals

Custom Evals

Learn how you can use custom evals on Athina.

Function evals

Function Evals

A set of preset functions for quick evaluation.

Write your own eval

Custom Code Eval

Learn how you can write python code as an eval.

We can build evals for you

We'll work with you to create custom evals for your use case

Schedule a call with us - we are happy to create custom evaluators for your use case.

Want us to integrate an open-source eval library?

We are happy to integrate with new libraries. Send us an email at hello@athina.ai.

Delete a Dataset Quick Start