The easiest way to get started is to follow this notebook

1. Configure API Keys

Evals use OpenAI, so you need to configure your OpenAI API key.

If you wish to view the results on Athina Develop, and maintain a historical record of prompts and experiments you run during your development workflow, then you also need an Athina API Key.

from athina.keys import AthinaApiKey, OpenAiApiKey
 
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY')) # optional

2. Load your dataset

Loading a dataset is quite straightforward - we support JSON and CSV formats.

from athina.loaders import Loader
 
# Load the data from CSV, JSON, Athina or Dictionary
dataset = Loader().load_json(json_file)

3. Run an eval on a dataset

Running evals on a batch of datapoints is the most effective way to rapidly iterate as you’re developing your model.

from athina.evals import ContextContainsEnoughInformation
 
# Run the ContextContainsEnoughInformation evaluator on the dataset
ContextContainsEnoughInformation(
    model="gpt-4-1106-preview",
    max_parallel_evals=5, # optional, speeds up evals
).run_batch(dataset).to_df()

Your results will be printed out as a dataframe that looks like this.


How do I know which fields I need in my dataset?

For the RAG Evals, we need 3 fields: query, context, and response.

For these evals, you should use the RagLoader to load your data. This will ensure the data is in the right format for evals.

Every evaluator has a REQUIRED_ARGS property that defines the parameters it expects.

If you pass the wrong parameters, the evaluator will raise a ValueError telling you what params you are missing.

For example:, the Faithfulness evaluator expects response and context fields.


Run an eval on a single datapoint

Running an eval on a single datapoint is very simple.

This might be useful if you are trying to run the eval immediately after inference.

# Run the answer relevance evaluator
# Checks if the LLM response answers the user query sufficiently
DoesResponseAnswerQuery().run(query=query, response=response)

Here’s a notebook you can use to get started.