This is an LLM Graded Evaluator

Github

Info

This evaluator checks if the LLM-generated response is faithful to the provided context.

For many RAG apps, you want to constrain the response to the context you are providing it (since you know it to be true). But sometimes, the LLM might use its pretrained knowledge to generate an answer. This is often the cause of “Hallucinations”.

Required Args

  • context: The context that your response should be faithful to
  • response: The LLM generated response

Default Engine: gpt-4


Example

  • Query: YC invests $500,000 in 200 startups twice a year.
  • Retrieved Context: YC takes 5-7% equity.

Eval Result

  • Result: Fail
  • Explanation: The response mentions that YC takes 5-7% equity, but this is not mentioned anywhere in the context.

Run the eval on a dataset

  1. Load your data with the Loader
from athina.loaders import Loader

# Load the data from JSON, Athina or Dictionary
dataset = Loader().load_json(json_file)
  1. Run the evaluator on your dataset
from athina.evals import Faithfulness

Faithfulness().run_batch(data=dataset)

Run the eval on a single datapoint

Faithfulness().run(
    context=context,
    response=response
)