Yes, you can specify your own model for running evals. However, keep the following in mind.

  • If your evaluation task is complex, use a powerful model like gpt-4o or claude-3-5-sonnet.

  • If your evaluation task is simple, use a smaller model like gpt-3.5-turbo or llama-3-8b.

Currently, we support all the major public models, as well as custom models.

from athina.evals import LlmEvaluator

LlmEvaluator(model="gpt-4", grading_criteria=grading_criteria).run(response)