Groundedness

This is an LLM Graded Evaluator Github | Example Notebook

❊ Info

Note: this evaluator is very similar to Faithfulness but it returns a metric between 0 and 1.

This evaluator checks if the LLM-generated response is grounded in the provided context. For many RAG apps, you want to constrain the response to the context you are providing it (since you know it to be true). But sometimes, the LLM might use its pretrained knowledge to generate an answer. This is often the cause of “Hallucinations”.

How does it work?

For every sentence in the response, an LLM looks for evidence of that sentence in the context.
If it finds evidence, it gives that sentence a score of 1. If it doesn’t, it gives it a score of 0.
The final score is the average of all the sentence scores.

View the source code on Github →

Default Engine: gpt-3.5-turbo Required Args

context: The context that your response should be grounded to
response: The LLM generated response

Metric:

Groundedness: Number of sentences in the response that are grounded in the context divided by the total number of sentences in the response.
- 0: None of the sentences in the response are grounded in the context
- 1: All of the sentences in the response are grounded in the context

Example

Context: Y Combinator was founded in March 2005 by Paul Graham and Jessica Livingston as a way to fund startups in batches. YC invests $500,000 in 200 startups twice a year.

Response: YC was founded by Paul Graham and Jessica Livingston. They invests $500k in 200 startups twice a year. In exchange, they take 7% equity.

Eval Result

Result: Fail
Score: 0.67
Explanation: There is no evidence of the following sentence in the context:
- “In exchange, they take 7% equity”

In Athina’s UI, sentences that are not grounded in the context are highlighted in red.

▷ Run the eval on a dataset

Load your data with the Loader

from athina.loaders import Loader

# Load the data from JSON, Athina or Dictionary
dataset = Loader().load_json(json_file)

Run the evaluator on your dataset

from athina.evals import Groundedness

Groundedness().run_batch(data=dataset)

▷ Run the eval on a single datapoint

Groundedness().run(
    context=context,
    response=response
)

Logging

Datasets

Evals

GraphQL API

Deprecated

❊ Info

How does it work?

Example

▷ Run the eval on a dataset

▷ Run the eval on a single datapoint

Logging

Datasets

Evals

GraphQL API

Deprecated

​❊ Info

​How does it work?

​Example

​▷ Run the eval on a dataset

​▷ Run the eval on a single datapoint

❊ Info

How does it work?

Example

▷ Run the eval on a dataset

▷ Run the eval on a single datapoint