- They require a reference to compare against: While you may have such ground truth data in your development dataset, you will never have this in production.
- Traditional metrics will not offer any reasoning capabilities Most developers are now using LLMs for much more complex use cases than can be evaluated by traditional methods.
- Can perform complex and nuanced tasks that include reasoning capabilities
- Come a lot closer to human-in-the-loop level of accuracy