Function Based Evaluators
❊ Info
These evaluators run a defined function on the response.
How does it work
A function evaluator runs a provided function along with the arguments for this function on the response and return whether the function passed or not.
Required Args
Your dataset must contain these fields:
response
: The LLM generated response for the user query
Metrics
Passed
: Boolean(True/False) value specifying whether the function passed or not.
▷ Run the function eval on a single datapoint
from athina.evals import ContainsAny
# Checks if the response contains any word from the keywords
response = "Y Combinator (YC) is a well-known startup accelerator based in Silicon Valley, California. Y Combinator is one of the most influential and successful startup accelerators globally."
ContainsAny(keywords=["YC", "startup"]).run(response=response).to_df()
▷ Run the function eval on a dataset
- Load your data with the
ResponseLoader
from athina.loaders import ResponseLoader
raw_data = [
{
"response": "I cannot answer this question as prices vary from country to country.",
},
{
"response": "A shooting star is a meteor that burns up in the atmosphere.",
}
]
# Load the data from CSV, JSON, Athina or Dictionary
dataset = ResponseLoader().load_dict(raw_data)
- Run the evaluator on your dataset
from athina.evals import ContainsAny
# Checks if the context contains enough information to answer the user query provided
ContainsAny(keywords=["star", "meteor"]).run_batch(data=dataset).to_df()
Following are examples of the various function evaluators we support
Regex
Description: Checks if the response
contains the regex pattern.
Arguments:
pattern
:str
Pattern to search for.
Sample Code:
from athina.evals import Regex
Regex(pattern='([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)').run_batch(data=dataset).to_df()
Contains Any
Description: Checks if the response
contains any word from the list of keywords.
Arguments:
keywords
:List[str]
List of keywordscase_sensitive
:Optional[bool]
. Defaults toFalse
.
Sample Code:
from athina.evals import ContainsAny
ContainsAny(
keywords=["star", "meteor"],
case_sensitive=False
).run_batch(data=dataset).to_df()
Contains None
Description: Checks if the response
does not contain any of the specified substrings.
Arguments:
keywords
: List of strings - keywords to check for absence in the context.
Sample Code:
from athina.evals import ContainsNone
ContainsNone(keywords=['abc', '123']).run_batch(data=dataset).to_df()
Contains
Description:
Checks if the response
contains the specified keyword.
Arguments:
keyword
: string to check for presence in the response.
Sample Code:
from athina.evals import Contains
Contains(keyword='test').run_batch(data=dataset).to_df()
ContainsAll
Description:
Checks if all the provided keywords are present in the response
.
Arguments:
keywords
: List[str] - The list of keywords to search for in the response.case_sensitive
: bool, optional - IfTrue
, the comparison is case-sensitive. Defaults toFalse
.
Sample Code:
from athina.evals import ContainsAll
ContainsAll(keywords=['test', 'example']).run_batch(data=dataset).to_df()
ContainsJson
Description:
Checks if the response
contains a valid JSON.
Arguments:
- None
Sample Code:
from athina.evals import ContainsJson
ContainsJson().run_batch(data=dataset).to_df()
ContainsEmail
Description:
Checks if the response
contains a valid email address.
Arguments:
- None
Sample Code:
from athina.evals import ContainsEmail
ContainsEmail().run_batch(data=dataset).to_df()
IsJson
Description:
Checks if the response
is a valid JSON.
Arguments:
- None
Sample Code:
from athina.evals import IsJson
IsJson().run_batch(data=dataset).to_df()
IsEmail
Description:
Checks if the response
is a valid email address.
Arguments:
- None
Sample Code:
from athina.evals import IsEmail
IsEmail().run_batch(data=dataset).to_df()
ContainsLink
Description:
Checks if the response
contains any links.
Arguments:
- None
Sample Code:
from athina.evals import ContainsLink
ContainsLink().run_batch(data=dataset).to_df()
ContainsValidLink
Description:
Checks if the response
contains valid links.
Arguments:
- None
Sample Code:
from athina.evals import ContainsValidLink
ContainsValidLink().run_batch(data=dataset).to_df()
NoInvalidLinks
Description:
Checks if the response
does not contain any invalid links.
Arguments:
- None
Sample Code:
from athina.evals import NoInvalidLinks
NoInvalidLinks().run_batch(data=dataset).to_df()
ApiCall
Description:
Performs an API call to a specified endpoint and picks up the evaluation result from the response. This evaluator is useful when you want to run some complex or custom logic on the response.
Arguments:
url
: string - API endpoint to call. Note that this API should accept POST request.headers
: dict - Headers to include in the API call.payload
: dict - Body to send with the API call. This payload will have the Response added to it.
Sample Code:
from athina.evals import ApiCall
from athina.loaders import ResponseLoader
# Raw data must contain response and optionally the query, context and expected_response to be passed to the API
raw_data = [
{
"response": "Response to be sent to the your own API based evaluator",
"query": "Query to be sent to the your own API based evaluator"
}
]
dataset = ResponseLoader().load_dict(raw_data)
ApiCall(
url="https://8e714940905f4022b43267e348b8a713.api.mockbin.io/",
payload={"evaluator": "custom_api_based_evaluator"},
headers={"Authorization": "Bearer token"}
).run_batch(data=dataset).to_df()
- We expect the API response to be in JSON format with two keys namely
result
andreason
. - Theresult
key should contain the evaluation result which should be a boolean value. - Thereason
key should contain the reason for the evaluation result which should be a string. - The dataset should contain theresponse
and optionally thequery
,context
andexpected_response
to be passed to the API.
Equals
Description: Checks if the response
is exactly equal to the specified string.
Arguments:
expected_response
:str
String to compare the response with.
Sample Code:
from athina.evals import Equals
Equals(expected_response="This is the expected response").run_batch(data=dataset).to_df()
StartsWith
Description: checks if the response
starts with the specified substring.
Arguments:
substring
:str
string to check at the start of theresponse
.
Sample Code:
from athina.evals import StartsWith
StartsWith(substring="The text starts with").run_batch(data=dataset).to_df()
EndsWith
Description: checks if the response
ends with the specified substring.
Arguments:
substring
:str
string to check at the end of theresponse
.
Sample Code:
from athina.evals import EndsWith
EndsWith(substring="with this substring.").run_batch(data=dataset).to_df()
LengthLessThan
Description: Checks if the length of the response
is less than a maximum length.
Arguments:
max_length
:int
the maximum allowable length for theresponse
.
Sample Code:
from athina.evals import LengthLessThan
LengthLessThan(max_length=20).run_batch(data=dataset).to_df()
LengthGreaterThan
Description: Checks if the length of the response
is more than a minimum length.
Arguments:
min_length
:int
the minimum allowable length for theresponse
.
Sample Code:
from athina.evals import LengthGreaterThan
LengthGreaterThan(min_length=20).run_batch(data=dataset).to_df()
Length Between
Description: Checks if the length of the response
is between the minimum and maximum length.
Arguments:
min_length
:int
the minimum allowable length for theresponse
.max_length
:int
the maximum allowable length for theresponse
.
Sample Code:
from athina.evals import LengthBetween
LengthBetween(min_length=20, max_length=100).run_batch(data=dataset).to_df()
One Line
Description: Checks if the response
is a single line.
Arguments:
- None
Sample Code:
from athina.evals import OneLine
OneLine().run_batch(data=dataset).to_df()
CustomCodeEval
Description: Runs a custom code as an evaluator.
Arguments:
code
:str
Code to be executed. The code should contain a function namedmain
which takes**kwargs
as input and returns a boolean value.
Sample Code:
from athina.evals import CustomCodeEval
# Example data
data = [
{"text": "This is a short text."},
{"text": "The Great Barrier Reef is the world's largest coral reef system.\n It is composed of over 2,900 individual reefs and 900 islands stretching for over 2,300 kilometers."}
]
code = """
def main(**kwargs):
return len(kwargs['text']) > 100
"""
CustomCodeEval(code=code).run_batch(data=data).to_df()
Read more about CustomCodeEval
JsonSchema
Description: Validates the JSON structure against a specified JSON schema.
Arguments:
schema
:str
The JSON schema to validate against.
Sample Code:
from athina.evals import JsonSchema
JsonSchema(schema="""
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"price": {
"type": "integer"
},
"description": {
"type": "string"
}
},
"required": [
"price", "description"
]
}
""").run_batch(data=dataset).to_df()
JsonValidation
Description: Validates the value of a JSON field against a specified condition.
Arguments:
validations: list A list of validation rules. Each rule is a dictionary with the following keys: json_path: str The JSON path to the field to validate. validating_function: str The name of the validation function to use.
validations
:list
The validations list
Sample Code:
from athina.evals import JsonValidation
JsonValidation(
validations=[{
"json_path": "$.description",
"validating_function": "Equals"
}]
).run_batch(data=dataset).to_df()