Guides
Import HuggingFace Dataset to Athina
This guide demonstrates how to log a dataset from HuggingFace into Athina using Python. We’ll walk through the process step-by-step, explaining each part of the code and its purpose.
Prerequisites
Before you begin, make sure you have:
- An Athina account and API key (you can sign up for free here)
- Python installed on your system
- The necessary Python libraries:
datasets
,athina-client
Step-by-Step Guide
0. Get your Athina API Key
You can get an Athina API key by signing up at https://app.athina.ai
1. Install Required Libraries
Install and import the required libraries to get started.
pip install datasets athina-client
import os
from athina_client import AthinaApiKey
from athina_client.datasets import Dataset
from datasets import load_dataset
Also, set your Athina API key:
AthinaApiKey.set_key(os.getenv("ATHINA_API_KEY"))
2. Load the Dataset from HuggingFace
HF_DATASET_ID = "openai/gsm8k"
SUBSET = "main"
SPLIT = "train"
LIMIT = 1000 # Number of rows to add - max. 1000
# Load a dataset from Hugging Face
hf_dataset = load_dataset(path=HF_DATASET_ID, data_dir=SUBSET, split=SPLIT)
# Define rows to add
rows = hf_dataset.to_list()[:1000]
Currently, you can add a maximum of 1000 rows to a dataset in Athina.
3. Log the Dataset to Athina
We’ll use the athina_client
library to log the dataset to Athina.
# Create a dataset on Athina
athina_dataset = Dataset.create(name=f"{HF_DATASET_ID}-{SUBSET}-{SPLIT}", rows=rows)
# Print the dataset URL
print (f"View dataset on Athina: https://app.athina.ai/develop/{athina_dataset.id}")
Athina is a collaborative IDE that lets teams experiment, evaluate, and monitor AI applications in a spreadsheet-like UI.
What Can You Do After Creating a Dataset?
- Run dynamic prompts on every row, using other columns as variables.
- Transform the dataset by executing custom code.
- Create custom evaluations or run 50+ preset evals and view metrics in a powerful dashboard.
- Use dynamic columns to classify text, retrieve data, extract entities, transform data, fetch from external APIs, and more.
- Experiment with multiple combinations of prompts and models simultaneously.