> ## Documentation Index
> Fetch the complete documentation index at: https://docs.athina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Why Not Use Traditional Evaluation Metrics?

Traditional evaluation metrics like BLEU and ROUGE have some value, but they also have major limitations:

* **They require a reference to compare against**: While you may have such ground truth data in your development dataset, you will never have this in production.
* **Traditional metrics will not offer any reasoning capabilities** Most developers are now using LLMs for much more complex use cases than can be evaluated by traditional methods.

In contrast, LLM evaluators:

* **Can perform complex and nuanced tasks that include reasoning capabilities**
* **Come a lot closer to human-in-the-loop level of accuracy**

Intuitively, this makes sense. The best tool we have for handling reasoning tasks on large pieces of text are LLMs. So why would you use anything else for evals?
