Athina can run continuous evaluations on your logs to monitor model performance in production
If you are logging inferences to Athina, you can configure evals to run automatically against logged inferences.
Eval results will show up on your dashboard as soon as they start running (typically, they can take a few minutes to run).
/observe/inference/:inference_id
).Below are some common questions about continuous evaluation on Athina.
—
Yes, evals will use your API key to access the logs. You can configure your LLM API key in the Athina Settings.
LLM-graded evaluation can get expensive if you’re running it on all logs.
To solve this we introduce a few controls:
When you configure an eval, you can choose which logs it should run on.
Currently, you can apply filters on prompt_slug
, environment
, customer_id
, and user_query
.
In your Athina Settings, you can configure a setting called Max Evals Per Month.
Athina will dynamically sample logs for evaluation to ensure this value is respected.
For example, if Max Evals Per Month is set to 30,000, then Athina will run ~1000 evaluations per day.
You can also configure a sampling rate for evals. This is a percentage of logs that will be evaluated.
For example, if you set the sampling rate to 10%, then only 10% of logs will be evaluated.
Note that the Max Evals Per Month setting will still be respected so the actual number of evals run will be the minimum of the two.
Once you save an eval, it will automatically run on all logs from the last 2 days, and then it will continue to run on all logs going forward.
Athina can run continuous evaluations on your logs to monitor model performance in production
If you are logging inferences to Athina, you can configure evals to run automatically against logged inferences.
Eval results will show up on your dashboard as soon as they start running (typically, they can take a few minutes to run).
/observe/inference/:inference_id
).Below are some common questions about continuous evaluation on Athina.
—
Yes, evals will use your API key to access the logs. You can configure your LLM API key in the Athina Settings.
LLM-graded evaluation can get expensive if you’re running it on all logs.
To solve this we introduce a few controls:
When you configure an eval, you can choose which logs it should run on.
Currently, you can apply filters on prompt_slug
, environment
, customer_id
, and user_query
.
In your Athina Settings, you can configure a setting called Max Evals Per Month.
Athina will dynamically sample logs for evaluation to ensure this value is respected.
For example, if Max Evals Per Month is set to 30,000, then Athina will run ~1000 evaluations per day.
You can also configure a sampling rate for evals. This is a percentage of logs that will be evaluated.
For example, if you set the sampling rate to 10%, then only 10% of logs will be evaluated.
Note that the Max Evals Per Month setting will still be respected so the actual number of evals run will be the minimum of the two.
Once you save an eval, it will automatically run on all logs from the last 2 days, and then it will continue to run on all logs going forward.