> ## Documentation Index
> Fetch the complete documentation index at: https://docs.athina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Which evaluations to use for RAG applications?

<Tip>
  See this post for a step-by-step guide and video on how to use Athina IDE to
  measure retrieval accuracy in RAG applications: [Measure Retrieval Accuracy
  Using Athina IDE](/guides/evals/measuring-retrieval-accuracy-in-rag)
</Tip>

### Common Failures in RAG-based LLM apps

RAG-based LLM apps are great, but there are always a lot of kinks and imperfections to iron out.

Here are some common ones:

![](https://mintlify.s3.us-west-1.amazonaws.com/athinaai/images/measure-retrieval.png)

#### Bad retrieval[](#bad-retrieval)

* Retrieved information is not aligned with ground truth ([Context Recall](api-reference/evals/preset-evals/rag/ragas#context-recall))
* Retrievals are present but they are not ranked high ([Context Precision](api-reference/evals/preset-evals/rag/ragas#context-precision))
* Retrieved information doesn't have enough information to answer query ([Context Sufficiency](api-reference/evals/preset-evals/rag/context-sufficiency))
* Retrieved information is not relevant to the query ([Context Relevancy](api-reference/evals/preset-evals/rag/ragas#context-relevancy))

#### Bad outputs[](#bad-outputs)

* Response says something that cannot be inferred from context ([Faithfulness](api-reference/evals/preset-evals/rag/response-faithfulness))
* Response has many sentences that were not grounded to context. ([Groundedness](api-reference/evals/preset-evals/rag/groundedness))
* Conversation / chat has messages that are not coherent given the previous messages. ([Conversation Coherence)](api-reference/evals/preset-evals/conversation-evals))
* Some other criteria... ([Custom Evaluation](/evals/custom-evals))

## How to detect such issues[](#how-to-detect-such-issues)

Just plug in the evaluators you need and run the evals on your dataset.

```python
import os
from athina import evals
from athina.loaders import Loader
from athina.keys import OpenAiApiKey
from athina.runner.run import EvalRunner
from athina.datasets import yc_query_mini
import pandas as pd

from dotenv import load_dotenv
load_dotenv()

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))

# Load a dataset from list of dicts
raw_data = yc_query_mini.data
dataset = Loader().load_dict(raw_data)

# View dataset in a dataframe
pd.DataFrame(dataset)

# Define evaluation suite
model = "gpt-4-turbo-preview"
eval_suite = [
    evals.RagasAnswerCorrectness(model=model),
    evals.RagasContextPrecision(model=model),
    evals.RagasContextRelevancy(model=model),
    evals.RagasContextRecall(model=model),
    evals.ContextContainsEnoughInformation(model=model),
    evals.RagasFaithfulness(model=model),
    evals.Faithfulness(model=model),
    evals.Groundedness(model=model),
    evals.DoesResponseAnswerQuery(model=model)
]

# Run the evaluation suite
batch_eval_result = EvalRunner.run_suite(
    evals=eval_suite,
    data=dataset,
    max_parallel_evals=8
)
batch_eval_result
```

You can run these evaluations in a python notebook, and view results in a dataframe like this: [Example Notebook on Github ](https://github.com/athina-ai/athina-evals/blob/main/examples/run_eval_suite.ipynb)
