❊ Info
Note: this evaluator is very similar to Faithfulness but it returns a metric between 0 and 1.
How does it work?
- For every sentence in the
response, an LLM looks for evidence of that sentence in thecontext. - If it finds evidence, it gives that sentence a score of 1. If it doesn’t, it gives it a score of 0.
- The final score is the average of all the sentence scores.
Default Engine:
gpt-3.5-turbo
Required Args
context: The context that your response should be grounded toresponse: The LLM generated response
Groundedness: Number of sentences in the response that are grounded in the context divided by the total number of sentences in the response.- 0: None of the sentences in the response are grounded in the context
- 1: All of the sentences in the response are grounded in the context
Example
- Context: Y Combinator was founded in March 2005 by Paul Graham and Jessica Livingston as a way to fund startups in batches. YC invests $500,000 in 200 startups twice a year.
- Response: YC was founded by Paul Graham and Jessica Livingston. They invests $500k in 200 startups twice a year. In exchange, they take 7% equity.
Eval Result
- Result: Fail
- Score: 0.67
- Explanation: There is no evidence of the following sentence in the context:
- “In exchange, they take 7% equity”
▷ Run the eval on a dataset
- Load your data with the
Loader
- Run the evaluator on your dataset