Continuous Evaluation
Athina can run continuous evaluations on your logs to monitor model performance in production
If you are logging inferences to Athina, you can configure evals to run automatically against logged inferences.
Eval results will show up on your dashboard as soon as they start running (typically, they can take a few minutes to run).
- The metrics from these evals will be used to calculate the model performance metrics visible on your dashboard.
- The evaluation results for each individual inference are visible in the Evals tab on the Inference Trace page (
/observe/inference/:inference_id
).
FAQs
Below are some common questions about continuous evaluation on Athina.
—
Will LLM-graded evals use my API key?
Yes, evals will use your API key to access the logs. You can configure your LLM API key in the Athina Settings.
How can I manage the cost of continuous evals?
LLM-graded evaluation can get expensive if you’re running it on all logs.
To solve this we introduce a few controls:
1. Inference filters
When you configure an eval, you can choose which logs it should run on.
Currently, you can apply filters on prompt_slug
, environment
, customer_id
, and user_query
.
2. Max Evals per month
In your Athina Settings, you can configure a setting called Max Evals Per Month.
Athina will dynamically sample logs for evaluation to ensure this value is respected.
For example, if Max Evals Per Month is set to 30,000, then Athina will run ~1000 evaluations per day.
3. Sampling Rate
You can also configure a sampling rate for evals. This is a percentage of logs that will be evaluated.
For example, if you set the sampling rate to 10%, then only 10% of logs will be evaluated.
Note that the Max Evals Per Month setting will still be respected so the actual number of evals run will be the minimum of the two.
Will evals run on previous logs as well?
Once you save an eval, it will automatically run on all logs from the last 2 days, and then it will continue to run on all logs going forward.