- The metrics from these evals will be used to calculate the model performance metrics visible on your dashboard.
- The evaluation results for each individual inference are visible in the Evals tab on the Inference Trace page (
/observe/inference/:inference_id
).
FAQs
Below are some common questions about continuous evaluation on Athina. —Will LLM-graded evals use my API key?
Yes, evals will use your API key to access the logs. You can configure your LLM API key in the Athina Settings.How can I manage the cost of continuous evals?
LLM-graded evaluation can get expensive if you’re running it on all logs. To solve this we introduce a few controls:1. Inference filters
When you configure an eval, you can choose which logs it should run on. Currently, you can apply filters onprompt_slug
, environment
, customer_id
, and user_query
.
2. Max Evals per month
In your Athina Settings, you can configure a setting called Max Evals Per Month. Athina will dynamically sample logs for evaluation to ensure this value is respected. For example, if Max Evals Per Month is set to 30,000, then Athina will run ~1000 evaluations per day.3. Sampling Rate
You can also configure a sampling rate for evals. This is a percentage of logs that will be evaluated. For example, if you set the sampling rate to 10%, then only 10% of logs will be evaluated.Note that the Max Evals Per Month setting will still be respected so the
actual number of evals run will be the minimum of the two.