Athina Evals
Quick Start Guides
Running evals using Athina SDK
Run 40+ preset evals or your own custom evals in just a few lines of code using our python SDK.
Running evals on Athina Platform
Run 40+ preset evals or your own custom evals on any dataset.
Comparing different models and prompts using Athina
Compare retrievals and responses from different datasets, and run evaluations on both datasets.
Continuous evaluation in production
Configure evaluations to run continuously on production logs to measure quality, and detect hallucinations.
Setting up evals in CI / CD
Run evals in your CI / CD pipeline to prevent regressions, and ensure that bad prompts / models don’t get to production.
Real-time guardrailing using Athina Guard
Detect bad inputs and outputs in real-time.
Preset Evaluators
Athina has a large library of preset evaluators to cover all kinds of common use cases.
However, evals are not one-size-fits-all. Which is why Athina supports many ways to use custom evals, or even create your own.
Schedule a call with us, and we’ll set up your evaluation and safety suite for you.
Hallucinations
Detect hallucinations in RAG apps
Detect hallucinations and measure quality of RAG apps in a few minutes.
Detect hallucinations in LLM generated summaries
Detect hallucinations and measure accuracy of LLM-generated summaries.
Safety
Prompt Injection
Fails if Prompt Injection attack is found in the text.
Detect PII
Detect common Prompt Injection attacks.
OpenAI Content Moderation
Detect common Prompt Injection attacks.
Guardrails
Detect common Prompt Injection attacks.
RAG
Measure retrieval and response quality in RAG apps
Use a suite of Ragas + Athina evals to measure the quality of your retrievals and responses for RAG apps.
Ragas
Ragas is a popular open-source library with state-of-the-art evals for RAG use cases.
Context Sufficiency
Checks if the retrieved context contains enough information to answer the query.
Answer Completeness
Checks if the LLM response completely answers the query.
Faithfulness
Checks if the LLM response was faithful to the provided context
Groundedness
Checks the LLM response sentence-by-sentence to find evidence of each sentence in the provided context.
Grounded evals
If you have ground truth data, you may use these evals
Grounded Evals
These evaluators look at the entire chat instead of just a single message.
Conversation evals
Conversational Evaluators
These evaluators look at the entire chat instead of just a single message.
Custom evals
Custom Evals
Learn how you can use custom evals on Athina.
Function evals
Function Evals
A set of preset functions for quick evaluation.
Write your own eval
Custom Code Eval
Learn how you can write python code as an eval.
We can build evals for you
We'll work with you to create custom evals for your use case
Schedule a call with us - we are happy to create custom evaluators for your use case.
Want us to integrate an open-source eval library?
We are happy to integrate with new libraries. Send us an email at hello@athina.ai.