Athina users run millions of evals on their logged inferences every week. Sign up for free

Evaluating logs in production is the only way to know if your LLM application is working correctly in the real world.

Online evals are a critical part of running a successful LLM application.

They allow you to measure the quality of your LLM application over time, detect performance and safety issues, and prevent regressions.

Why use Athina for Online Evals?

  • 50+ preset evals
  • Support for custom evals
  • Support for popular eval libraries like Ragas, Guardrails, etc
  • Sampling: sample a subset of logs
  • Filtering: only run on logs WHERE X is true
  • Rate limiting: intelligent throttling to avoid rate limiting issues with your LLM provider
  • Use any model provider for LLM evals
  • View aggregate analytics
  • View traces with eval results
  • Track eval results over time

How does it work?

This is a simplified view of the architecture used to run evals on logged inferences in production at scale.

Key Features


👋 Athina

We spent a lot of time working through these problems so you don’t need a dedicated team for this. You can see a demo video here.

Website: Athina AI (Try our sandbox ).

Sign Up for Athina.

Github : Run any of our 40+ open source evaluations using our Python SDK to measure your LLM app.