Running Evals in UI
There are a number of ways to run evals using Athina:
- Run evals using Python SDK
- Run evals on a dataset using Athina Platform
- Compare 2 datasets side by side with evaluation metrics
- Run evals as real-time guardrails using
athina.guard()
- Configure evals to run continuously on Production Traces
Running evals in Athina IDE
For a more comprehensive video guide on running evals in Athina IDE, see this guide.
Configure evals to run continuously on Production Traces
If you configure evaluations in the dashboard at https://app.athina.ai/evals/config , they will run automatically against all logged inferences that meet your filters.
Note: Logs may be sampled to ensure that evaluations run within your configured limits. You can adjust these limits in the Settings page.
Note: Continuous evaluation is only available for paid plans. Contact hello@athina.ai to upgrade your plan.
Running evals as guardrails around inference using athina.guard()
This is useful if you want to run evaluations at inference time to prevent bad user queries or bad responses.
Keep in mind there may be latency impacts here. We recommend running only low-latency evaluations if you’re using athina.guard()
.
Follow this example notebook
Run a single eval manually from the Inference (Trace) page.
- Open the inference you want to evaluate, and click the “Run Eval” button (located towards the top-right).
- Choose the evaluation you want to run (Note: function evals cannot be run from the inference page).
- Choose the LLM engine for your evaluation.
Eval results will appear shortly in the Evals tab on the right.