Athina home page
Search...
⌘K
Ask AI
AI Hub
Website
Sign-up
Sign-up
Search...
Navigation
Getting Started
Getting Started with Athina
Docs
API / SDK Reference
Guides
FAQs
Documentation
Open-Source Evals
Blog
Email us
Book a call
Getting Started
Introduction
Getting Started
Datasets
Introduction
Creating a dataset
Dynamic Columns
Run evaluations (UI)
View Metrics
Run Experiments
Compare datasets
Join Datasets
Export / Download Datasets
SQL Notebook
Automations
Manage Datasets
Delete a Dataset
Evals
Overview
Quick Start
Online Evals
Offline Evals
Preset Evaluators
Custom Evals
Running Evals in UI
Running Evals via SDK
Running Evals in CI/CD
Why Athina Evals?
Cookbooks
Flows
Overview
Concepts
Variables
Sharing Flows
Flow Templates
Blocks
Annotation
Overview
Metrics
Configure Project
View Data
Review Entries
Export Data
Permissions
Prompts
Overview
Concepts
Prompt Syntax
Create Prompt Template
Prompt Versioning
Delete Prompt Slug
List Prompt Slugs
Duplicate Prompt Slug
Run Prompt
Multi-Prompt Playground
Run Evaluations on a Prompt Response
Organize Prompts
Monitoring
Overview
Inference Trace
Analytics
Topic Classification
Export Data from Athina
Continuous Evaluation
Model Performance Metrics
Settings
Custom Models
Sampling Evals
Credits
Integrations
Integrations
Self Hosting
Self-Hosting
Self-Hosting On Azure
Datasets
Import a HuggingFace Dataset
On this page
Athina IDE
Prompts & Experimentation
Evaluations & Quality
Observability & Analytics
Datasets & Testing
Getting Started
Getting Started with Athina
Athina IDE
Athina IDE is a collaborative editor for AI teams to prototype, experiment, and evaluate LLM-powered applications.
Watch a Demo Video
View a Demo Video to learn more about Athina IDE.
Quickstart: Run Evals
A quickstart guide for running evals in the UI or programmatically.
Prompts & Experimentation
Athina Playground
Experiment with different prompts and models
Prompt Management
Organize prompts into folders, add versioning, and manage collaboratively
Run Prompts via API
Execute your saved prompts programmatically via API or SDK
Prototype Pipelines
Chain prompts and API calls to build complex pipelines
Evaluations & Quality
Run Evaluations
Choose from 50+ preset evals or create custom evaluations
Continuous Evaluation
Automatically evaluate production traffic for quality and safety
CI/CD Integration
Prevent regressions by running evals in your CI/CD pipeline
Custom Evaluation
Build custom evaluation logic for your specific use case
Observability & Analytics
Logging Setup
Start logging LLM interactions in 2 lines of code
Usage Analytics
Track costs and usage across models, prompts, and customers
Trace Visualization
Visualize and debug complex LLM chains
Export & Query
Access your data via GraphQL API or export as CSV/JSON
Datasets & Testing
Create Datasets
Import data from logs, files, or generate synthetic datasets
Dynamic Columns
Add computed columns using LLM calls, code, or API requests
Compare Datasets
Compare performance across different models or prompts
Run Experiments
Test different approaches and measure improvements
Introduction
Introduction
Assistant
Responses are generated using AI and may contain mistakes.