Function Based Evaluators

❊ Info

These evaluators run a defined function on the response. How does it work A function evaluator runs a provided function along with the arguments for this function on the response and return whether the function passed or not. Required Args Your dataset must contain these fields:

response: The LLM generated response for the user query

Metrics

Passed: Boolean(True/False) value specifying whether the function passed or not.

▷ Run the function eval on a single datapoint

from athina.evals import ContainsAny

# Checks if the response contains any word from the keywords
response = "Y Combinator (YC) is a well-known startup accelerator based in Silicon Valley, California. Y Combinator is one of the most influential and successful startup accelerators globally."
ContainsAny(keywords=["YC", "startup"]).run(text=response).to_df()

▷ Run the function eval on a dataset

Load your data into a dictionary

from athina.evals import ContainsAny
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]

Run the evaluator on your dataset

from athina.evals import ContainsAny

# Checks if the context contains enough information to answer the user query provided
ContainsAny(keywords=["star", "meteor"]).run_batch(data=dataset).to_df()

Following are examples of the various function evaluators we support

Regex

Description: Checks if the response contains the regex pattern. Arguments:

pattern: str Pattern to search for.

Sample Code:

from athina.evals import Regex

Regex(pattern='([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)').run_batch(data=dataset).to_df()

Contains Any

Description: Checks if the response contains any word from the list of keywords. Arguments:

keywords: List[str] List of keywords
case_sensitive: Optional[bool]. Defaults to False.

Sample Code:

from athina.evals import ContainsAny

ContainsAny(
    keywords=["star", "meteor"],
    case_sensitive=False
).run_batch(data=dataset).to_df()

Contains None

Description: Checks if the response does not contain any of the specified substrings. Arguments:

keywords: List of strings - keywords to check for absence in the context.

Sample Code:

from athina.evals import ContainsNone

ContainsNone(keywords=['abc', '123']).run_batch(data=dataset).to_df()

`Contains`

Description:
Checks if the response contains the specified keyword. Arguments:

keyword: string to check for presence in the response.

Sample Code:

from athina.evals import Contains

Contains(keyword='test').run_batch(data=dataset).to_df()

`ContainsAll`

Description:
Checks if all the provided keywords are present in the response. Arguments:

keywords: List[str] - The list of keywords to search for in the response.
case_sensitive: bool, optional - If True, the comparison is case-sensitive. Defaults to False.

Sample Code:

from athina.evals import ContainsAll

ContainsAll(keywords=['test', 'example']).run_batch(data=dataset).to_df()

`ContainsJson`

Description:
Checks if the response contains a valid JSON. Arguments:

None

Sample Code:

from athina.evals import ContainsJson

ContainsJson().run_batch(data=dataset).to_df()

`ContainsEmail`

Description:
Checks if the response contains a valid email address. Arguments:

None

Sample Code:

from athina.evals import ContainsEmail

ContainsEmail().run_batch(data=dataset).to_df()

`IsJson`

Description:
Checks if the response is a valid JSON. Arguments:

None

Sample Code:

from athina.evals import IsJson

IsJson().run_batch(data=dataset).to_df()

`IsEmail`

Description:
Checks if the response is a valid email address. Arguments:

None

Sample Code:

from athina.evals import IsEmail

IsEmail().run_batch(data=dataset).to_df()

`ContainsLink`

Description:
Checks if the response contains any links. Arguments:

None

Sample Code:

from athina.evals import ContainsLink

ContainsLink().run_batch(data=dataset).to_df()

`ContainsValidLink`

Description:
Checks if the response contains valid links. Arguments:

None

Sample Code:

from athina.evals import ContainsValidLink

ContainsValidLink().run_batch(data=dataset).to_df()

`NoInvalidLinks`

Description:
Checks if the response does not contain any invalid links. Arguments:

None

Sample Code:

from athina.evals import NoInvalidLinks

NoInvalidLinks().run_batch(data=dataset).to_df()

`ApiCall`

Description:
Performs an API call to a specified endpoint and picks up the evaluation result from the response. This evaluator is useful when you want to run some complex or custom logic on the response. Arguments:

url: string - API endpoint to call. Note that this API should accept POST request.
headers: dict - Headers to include in the API call.
payload: dict - Body to send with the API call. This payload will have the Response added to it.

Sample Code:

from athina.evals import ApiCall
from athina.loaders import ResponseLoader

# Raw data must contain response and optionally the query, context and expected_response to be passed to the API
raw_data = [
    {
        "response": "Response to be sent to the your own API based evaluator",
        "query": "Query to be sent to the your own API based evaluator"
    }
]
dataset = ResponseLoader().load_dict(raw_data)

ApiCall(
    url="https://8e714940905f4022b43267e348b8a713.api.mockbin.io/",
    payload={"evaluator": "custom_api_based_evaluator"},
    headers={"Authorization": "Bearer token"}
).run_batch(data=dataset).to_df()

We expect the API response to be in JSON format with two keys namely result and reason. - The result key should contain the evaluation result which should be a boolean value. - The reason key should contain the reason for the evaluation result which should be a string. - The dataset should contain the response and optionally the query, context and expected_response to be passed to the API.

Equals

Description: Checks if the response is exactly equal to the specified string. Arguments:

expected_response: str String to compare the response with.

Sample Code:

from athina.evals import Equals
dataset = [
  {"expected_text": "This is the expected response", "text": "This is the expected response"}
]

Equals().run_batch(data=dataset).to_df()

StartsWith

Description: checks if the response starts with the specified substring. Arguments:

substring: str string to check at the start of the response.

Sample Code:

from athina.evals import StartsWith
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]
StartsWith(substring="A star").run_batch(data=dataset).to_df()

EndsWith

Description: checks if the response ends with the specified substring. Arguments:

substring: str string to check at the end of the response.

Sample Code:

from athina.evals import EndsWith
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]
EndsWith(substring="burns up.").run_batch(data=dataset).to_df()

LengthLessThan

Description: Checks if the length of the response is less than a maximum length. Arguments:

max_length: int the maximum allowable length for the response.

Sample Code:

from athina.evals import LengthLessThan
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]
LengthLessThan(max_length=50).run_batch(data=dataset).to_df()

LengthGreaterThan

Description: Checks if the length of the response is more than a minimum length. Arguments:

min_length: int the minimum allowable length for the response.

Sample Code:

from athina.evals import LengthGreaterThan
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]
LengthGreaterThan(min_length=20).run_batch(data=dataset).to_df()

Length Between

Description: Checks if the length of the response is between the minimum and maximum length. Arguments:

min_length: int the minimum allowable length for the response.
max_length: int the maximum allowable length for the response.

Sample Code:

from athina.evals import LengthBetween
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]
LengthBetween(min_length=20, max_length=100).run_batch(data=dataset).to_df()

One Line

Description: Checks if the response is a single line. Arguments:

None

Sample Code:

from athina.evals import OneLine
dataset = [
    {"text": "A star is a massive object in space that emits light."},
    {"text": "A meteor enters the Earth's atmosphere and burns up."},
    {"text": "The ocean is vast and mysterious."}
]
OneLine().run_batch(data=dataset).to_df()

CustomCodeEval

Description: Runs a custom code as an evaluator. Arguments:

code: str Code to be executed. The code should contain a function named main which takes **kwargs as input and returns a boolean value.

Sample Code:

from athina.evals import CustomCodeEval

# Example data
data = [
    {"text": "This is a short text."},
    {"text": "The Great Barrier Reef is the world's largest coral reef system.\n It is composed of over 2,900 individual reefs and 900 islands stretching for over 2,300 kilometers."}
]

code = """
def main(**kwargs):
    return len(kwargs['text']) > 100
"""

CustomCodeEval(code=code).run_batch(data=data).to_df()

JsonSchema

Description: Validates the JSON structure against a specified JSON schema. Arguments:

schema: str The JSON schema to validate against.

Sample Code:

from athina.evals import JsonSchema
dataset = [
    {"actual_json": {"price": 100, "description": "A description of the item."}},
    {"actual_json": {"price": 200, "description": "Another item description."}}
]
JsonSchema(schema="""
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "price": {
      "type": "integer"
    },
    "description": {
      "type": "string"
    }
  },
  "required": [
    "price", "description"
  ]
}
""").run_batch(data=dataset).to_df()

JsonValidation

Description: Validates the value of a JSON field against a specified condition. Arguments: validations: list A list of validation rules. Each rule is a dictionary with the following keys: json_path: str The JSON path to the field to validate. validating_function: str The name of the validation function to use.

validations: list The validations list

Sample Code:

from athina.evals import JsonValidation
dataset = [
    {
        "actual_json": {"price": 100, "description": "A description of the item."},
        "expected_json": {"price": 200, "description": "Another item description."}
    }
]
JsonValidation(
  validations=[{
    "json_path": "$.description",
    "validating_function": "Equals"
  }]
).run_batch(data=dataset).to_df()

Logging

Datasets

Evals

GraphQL API

Deprecated

Function Based Evaluators

❊ Info

▷ Run the function eval on a single datapoint

▷ Run the function eval on a dataset

Following are examples of the various function evaluators we support

Regex

Contains Any

Contains None

`Contains`

`ContainsAll`

`ContainsJson`

`ContainsEmail`

`IsJson`

`IsEmail`

`ContainsLink`

`ContainsValidLink`

`NoInvalidLinks`

`ApiCall`

Equals

StartsWith

EndsWith

LengthLessThan

LengthGreaterThan

Length Between

One Line

CustomCodeEval

JsonSchema

JsonValidation

Logging

Datasets

Evals

GraphQL API

Deprecated

​❊ Info

​▷ Run the function eval on a single datapoint

​▷ Run the function eval on a dataset

​Following are examples of the various function evaluators we support

​Regex

​Contains Any

​Contains None

​Contains

​ContainsAll

​ContainsJson

​ContainsEmail

​IsJson

​IsEmail

​ContainsLink

​ContainsValidLink

​NoInvalidLinks

​ApiCall

​Equals

​StartsWith

​EndsWith

​LengthLessThan

​LengthGreaterThan

​Length Between

​One Line

​CustomCodeEval

​JsonSchema

​JsonValidation

❊ Info

▷ Run the function eval on a single datapoint

▷ Run the function eval on a dataset

Following are examples of the various function evaluators we support

Regex

Contains Any

Contains None

`Contains`

`ContainsAll`

`ContainsJson`

`ContainsEmail`

`IsJson`

`IsEmail`

`ContainsLink`

`ContainsValidLink`

`NoInvalidLinks`

`ApiCall`

Equals

StartsWith

EndsWith

LengthLessThan

LengthGreaterThan

Length Between

One Line

CustomCodeEval

JsonSchema

JsonValidation