Example

How does it work?

Open-Source Evals

Blog

Email us

Athina

Overview

Logging

Athina IDE

Evals

Guides

Sign-up

OpenAI Content Moderation

Athina is a collaborative platform for AI teams to prototype, experiment, evaluate and monitor LLM-powered applications.

Introduction

Athina AI

Getting Started Guide

Getting Started with Athina

To get started with Athina's Monitoring, the first step is to start logging your inferences.

Logging Attributes

If you're using OpenAI chat completions in Python, you can get set up in just 2 minutes

OpenAI Chat (1.x)

OpenAI Chat Completion

OpenAI Chat (0.x)

OpenAI Assistant

LiteLLM

🚅 LiteLLM

Langchain

Log via API Request

Log via Python SDK

Athina LLM Application Tracing captures the full context of an execution including retrieval, generation, api calls, and more

Tracing

Athina Tracing integrates with Langchain using Langchain Callbacks (Python). Thereby, our SDK automatically creates a nested trace for every run of your Langchain application.

Tracing for Langchain

Tracing for Langchain (Python)

Supported Models

Update Logs By ID

Update Logs By External Reference ID

Where is data stored?

Logging Latency

Q. Will the Athina logging SDK increase my latency?

Is Athina using a Proxy method?

Q. Is this SDK going to make a proxy request to OpenAI through Athina?

On-prem deployment

Q. Can Athina's observability platform be deployed on-prem?

Proxy

Q. Can I log inferences from any model to Athina?

How to log conversations

Grouping Inferences

_If you're using OpenAI completions in Python, you can get set up in just **2 minutes**_

OpenAI Completion 0.x

OpenAI Completions

OpenAI Completion 1.x

Athina Evals

Quick Start

Automatic Evals

Why Athina Evals

Running Evals

Python: Run a single eval

Here is a sample of all the code you need to run a suite of evals.

Python: Run evals on a dataset

Running a suite of evals

Continuous Evaluation in Production

Preset Evals

RAG evals

Context Contains Enough Information

Faithfulness

Does Response Answer Query

JSON Evals

Summarization Q&A

More Evals

Other Evaluators

Function Evals

Function Based Evaluators

Grounded Evals

RAGAS

Conversation Evaluators

Conversation Evals

Groundedness

Custom Evals

API Call

LLM-as-a-Judge (Custom Prompt Eval)

Pairwise Evaluation

Custom Code Eval

Evaluation with Custom Python Code

Custom Grading Criteria

Bring your own eval

Loading data for Evals

Loading data via Llama-Index

Running Evals in CI/CD

FAQs

Frequently Asked Questions

Cookbooks

Eval Cookbooks

Create Your Own Eval

Write your own LLM Eval class

Develop Dashboard

Athina Develop: A Dashboard for your Iterations

Open Source Evaluation: Philosophy

A guide to RAG evaluation

Which evaluations to use for RAG applications?

Prompt Injection: Attacks and Defenses

Running evals as real-time guardrails

Different stages of evaluation

Improving Eval performance

How can I improve the performance / reliability of my evals?

Compare and Evaluate Multiple Models

Run an Experiment to compare and evaluate responses from different models

Comparing datasets using Athina IDE

Measuring Retrieval Accuracy in RAG Apps

Measure Retrieval Accuracy Using Athina IDE

Athina offers a powerful prompt management system that allows you to create, edit, manage, test and version prompts.

Understand how prompts are structured in Athina

Concepts

Learn how to write prompts in Athina's Playground.

Prompt Syntax

You can create prompts in Athina's Prompt Playground, or via API

Create Prompt Template

You can version your prompts in Athina's Prompt Playground, or via API.

Prompt Versioning

You can delete a prompt slug in Athina's Prompt Playground, or via API

Delete Prompt Slug

You can list all prompt slugs stored on Athina via API or Python SDK

List Prompt Slugs

You can duplicate a prompt slug in Athina's Prompt Playground, or via API

Duplicate Prompt Slug

You can run prompts in Athina's Prompt Playground, or via API

Run Prompt

You can run multiple prompts in Athina's Prompt Playground, or via API

Multi-Prompt Playground

You can evaluate prompt responses in Athina's Playground.

Run Evaluations on a Prompt Response

You can organize your prompts in Athina's Prompt Management System

Organize Prompts

Advanced Monitoring & Analytics. Your production environment will thank you.

Athina Monitoring

Inference Trace

Analytics

Analytics and Insights

Topic Classification

Query Topic Classification

Export Data from Athina

How can I export my logged inferences?

Athina can run continuous evaluations on your logs to monitor model performance in production

Continuous Evaluation

Model Performance Metrics

You can configure custom models on Athina in the Settings page.

Custom Models

You can set up sampling rules for continuous evals in the Settings.

Sampling Evals

You can currently create datasets in Athina in the following ways:

Create a Dataset

You can import your inference logs to create a dataset in Athina.

Create a Dataset from Logs

You can upload a JSONL file to create a dataset in Athina.

Create a Dataset from a File Upload

You can use our Python SDK to create a dataset in Athina.

Create a Dataset via Python SDK

You can use simple POST API requests to add rows to a dataset.

Create a Dataset via API request

You can generate synthetic datasets in Athina.

Generate a Synthetic Dataset

You can delete a dataset in Athina by following these steps.

Edit a Dataset

Delete a Dataset

Dynamic columns let you run prompts, code execution, retrievals, and more on your datasets

⚡️ Dynamic Columns

Send a prompt to a language model to generate an AI response

Classify text into pre-defined labels using an LLM

Classification

Extract entities from a previous column using an LLM

Extract Entities

Code Execution

Run evaluations on Athina IDE in a few clicks

Run evaluations

You can view and compare metrics across datasets in Athina.

Metrics

Re-generate a dataset with a new prompt or a new model and compare the results side-by-side

Experiment with different prompts and models

You can compare multiple datasets side-by-side in Athina.

Compare datasets

Import HuggingFace Dataset to Athina

You can prototype and evaluate a prompt chain in Athina IDE

Prototype and Evaluate a Prompt Chain

How to Measure Retrieval Accuracy in RAG Applications Using Athina IDE

Run Prompts and Evaluate

You can list all datasets using the Python SDK.

List All Datasets

You can create a dataset programmatically using the Python SDK.

You can add rows to an existing dataset using the Python SDK.

Add Rows to Dataset

You can get a specific dataset by ID or name.

Get Dataset

You can delete a dataset by ID via the Python SDK.

Delete Dataset

Athina is natively integrated with the following projects/packages:

Integrations

Getting Started

Sample GraphQL Queries

We have provided examples of how to query the Athina AI GraphQL API using cURL and Python. You can use these examples to fetch data from the API and integrate it into your applications.

Evals

Concepts

​Example

​How does it work?

Example

How does it work?