Run an Experiment to compare and evaluate responses from different models
Another way to do this is to create multiple dynamic columns - one for each model that you want to compare. Learn more.
You can run an experiment on Athina to compare multiple models in a few clicks.
Create or Import a Dataset
Start with a dataset. Here are some instructions on how you can create a dataset on Athina.
Configure Experiment
-
Then click the Experiment button on the top-right, and select the models in the dropdown.
-
Enter the Prompt Template.
Note that you can reference the dataset columns as variables in the prompt template using
{{
. -
When you run the experiment, Athina will run the prompt template and generate a new dataset with the results for each model.
Athina will create or overwrite a column called
response
in each dataset.
Compare Responses
Athina will also open all these datasets in the “Datasets” tab, so you can compare them side-by-side.
You can turn the diff view on or off by clicking the “View Options” button on the top-right.
Run Evaluations
Now if you run evaluations, they will be run on all rows of all the datasets at once.
Once the evaluations are complete, you can scroll to the right, and you will see the evaluation metrics for each eval and each dataset.
View the Evaluation Metrics
Click on the evaluation metric to see the detailed results.
Learn more here.