For this example, we will consider the Loader class. Other loaders also work similarly.

You can load data for evals from a JSON, Python Dictionary, or directly from your logged inferences on Athina.

Loading from a JSON file

from athina.loaders import Loader

dataset = Loader().load_json(json_filename)

That’s all you need to do to load your data!

To view the imported dataset as a pandas DataFrame:

pd.DataFrame(dataset)

Loading from a Python Dictionary

from athina.loaders import Loader

# Create batch dataset from list of dict objects
raw_data = [
    {
        "query": "What is the capital of Greece?",
        "context": ["Greece is often called the cradle of Western civilization."],
        "response": "Athens",
    },
    {
        "query": "What is the price of a Tesla Model 3?",
        "context": ["Tesla Model 3 is a fully electric car."],
        "response": "I cannot answer this question as prices vary from country to country.",
    },
    {
        "query": "What is a shooting star?",
        "context": ["Black holes are stars that have collapsed under their own gravity. They are so dense that nothing can escape their gravitational pull, not even light."],
        "response": "A shooting star is a meteor that burns up in the atmosphere.",
    }
]

dataset = Loader().load_dict(raw_data)

Loading logged inferences from Athina

Instead of generating inferences, you can just load inferences that you have already logged to Athina.

# Load last 50 logged inferences
dataset = Loader().load_athina_inferences()

You can optionally apply filters to load a different subset of inferences from Athina

from athina.loaders import Loader
from athina.interfaces.athina import AthinaFilters

# Load 100 logged inferences matching these filters
filters = AthinaFilters(
    environment="production",
    prompt_slug="refund_prompt_5.4",
    language_model_id="gpt-4",
    customer_id="nike-usa",
    topic="refunds",
)
dataset = Loader().load_athina_inferences(filters=filters, limit=100)

Output Format

The output format will be different for different Loaders.

The Loader will return a List[DataPoint] type after you call the load function of choice.

class DataPoint(TypedDict):
    query: str
    context: List[str]
    response: str
    expected_response: str