Athina AI
Monitoring, Analytics & Evaluations for LLM Developers
Athina is an evaluation framework and production observability platform for LLM-powered apps.
Athina has a suite of tools to help you safeguard LLMs in production.
- Complete production observability platform, with real-time monitoring and analytics
- Powerful evaluation framework to run in development, CI / CD, or production
- Prompt management and experimentation tools
Observability
Athina Monitor assists developers in several key areas:
- Visibility: By logging prompt-response pairs using our SDK, you get complete visibility into your LLM touchpoints, allowing you to trace through and debug your retrievals and generations.
- Usage Analytics: Athina will keep track of usage metrics like
response time
,cost
,token usage
,feedback
, and more, regardless of which LLM you are using. - Query Topic Classification: Automatically classify user queries into topics to get detailed insights into popular subjects and AI performance per topic.
- Granular Segmentation: You can segment your usage and performance metrics based on different metadata properties such as customer ID, prompt version, language model ID, topic, and more to slice and dice your metrics.
Evaluations
- Configuring evals
- Choose from 40+ preset evaluations
- Support for custom evals
- Create your own eval
- Running evals
- Run evals continuously in production
- Run evals during development using SDK
- Run evals on Athina Platform
- Run evals to compare multiple datasets
- Run evals in CI / CD
- Run evals using
athina.guard
as real-time guardrails
- Analyze results
- View eval metrics over time
- View percentile distributions of eval metrics
- Compare evaluation metrics for different prompts, models, topics and customers