Evals
Eval Cookbooks
Here are some cookbooks we’ve prepared to make it easy to set up and run evals using Athina.
- Run a preset eval : This cookbook shows you how to run a single eval on your dataset
- Run an eval suite : This cookbook shows you how to run a suite of evals
- Run an experiment This cookbook shows how to run an eval using Athina, and also log the experiment configuration.
This is very similar to #1, but you are also describing an AthinaExperiment
object, so the experiments will be logged to your develop dashboard, along with the metadata and experiment parameters (like prompt).
A custom grading criteria is the easiest way to create your own eval.
These evals take the format: “If X, then fail. Otherwise, pass”
This gets wrapped inside our CoT prompt, and enforces a JSON output of pass / fail along with a reason.
This is best used for very simple conditional evals (like the one below)