Athina AI
Athina is a collaborative platform for AI teams to prototype, experiment, evaluate and monitor LLM-powered applications.
Athina is an end-to-end platform for AI teams to prototype, experiment, monitor, and evaluate LLM-powered applications for production use cases.
- Complete production observability platform, with real-time monitoring and analytics
- Powerful evaluation framework and tools to run in development, CI / CD, or production
- Prompt management and experimentation tools
See our new IDE where you can prototype pipelines, run experiments, and evaluate, and compare datasets. See the docs →
Athina IDE
Athina IDE is a collaborative editor for AI teams to prototype, experiment, and evaluate LLM-powered applications.
It provides a suite of tools to create, manage, and evaluate datasets, prompts, and evaluations.
Observability
Athina Monitor assists developers in several key areas:
- Visibility: By logging prompt-response pairs using our SDK, you get complete visibility into your LLM touchpoints, allowing you to trace through and debug your retrievals and generations.
- Usage Analytics: Athina will keep track of usage metrics like
response time
,cost
,token usage
,feedback
, and more, regardless of which LLM you are using. - Query Topic Classification: Automatically classify user queries into topics to get detailed insights into popular subjects and AI performance per topic.
- Granular Segmentation: You can segment your usage and performance metrics based on different metadata properties such as customer ID, prompt version, language model ID, topic, and more to slice and dice your metrics.
Evaluations
- Configuring evals
- Choose from 40+ preset evaluations
- Support for custom evals
- Create your own eval
- Running evals
- Run evals continuously in production
- Run evals during development using SDK
- Run evals on Athina Platform
- Run evals to compare multiple datasets
- Run evals in CI / CD
- Run evals using
athina.guard
as real-time guardrails
- Analyze results
- View eval metrics over time
- View percentile distributions of eval metrics
- Compare evaluation metrics for different prompts, models, topics and customers