Does the provided context {context} contain the answer to this question: {query}
Since the eval is performing a much simpler task, it can be expected to work consistently most of the time We can also run the same grading prompt multiple times to detect flakiness and discard flaky results.