Can I choose which model to use for running evaluations? - Athina

Yes, you can specify your own model for running evals. However, keep the following in mind.

If your evaluation task is complex, use a powerful model like gpt-4o or claude-3-5-sonnet.
If your evaluation task is simple, use a smaller model like gpt-3.5-turbo or llama-3-8b.

Currently, we support all the major public models, as well as custom models.

from athina.evals import LlmEvaluator

LlmEvaluator(model="gpt-4", grading_criteria=grading_criteria).run(response)

Why use LLM-as-a-judge for evaluations?How do you manage costs for LLM evaluation?