Uses your evaluation prompt
If you have a more complex evaluation prompt that you would like to run within Athina’s framework, we can support that with our CustomPrompt class.
- Input:
response, query, context, expected_response (whichever you specify in your prompt).
- Type:
boolean
- Metrics:
passed (0 or 1)
Example:
Evaluation Inputs:
- eval_prompt: “Think step-by-step. Based on the provided user
query and refund policy, determine if the response adheres to the refund policy. User query: Refund policy: response: ”
query: “How many vacation days are we allowed?”
context: “Employees are allowed 15 holidays per year, and 5 days of paid leave.”
response: “Employees are allowed 20 vacation days per year, and 5 days of paid leave.”
Evaluation Results:
- result: Fail
- explanation: The
response does not adhere to the refund policy provided. The refund policy is that employees are allowed 15 holidays per year, and 5 days of paid leave.
Demo Video
How to use it in the SDK
Simply use the CustomPrompt class and specify your own eval_prompt.
from athina.evals import CustomPrompt
eval_prompt = """
Think step-by-step.
Based on the provided user query, determine if the response answer the query correctly.
If it does then result should be True else False. Add explanation if False
User query: {{query}}
response: {{response}}
"""
data = [
{
"query": "Where is France and what is it's capital?",
"context": ["France is the country in europe known for delicious cuisine", "Tesla is an electric car", "Elephant is an animal"],
"response": "Tesla is an electric car",
}
]
batch_run_result = CustomPrompt(
display_name="Response must answer the query",
required_args=["query", "response"],
model="gpt-4o",
eval_prompt=eval_prompt,
).run_batch(data=data)
See an example notebook → .
Note: Any variables you use in the prompt (for example: query, context,
response) will be interpolated from your dataset.