Custom Evals
LLM-as-a-Judge (Custom Prompt Eval)
Uses your evaluation prompt
If you have a more complex evaluation prompt that you would like to run within Athina’s framework, we can support that with our CustomPrompt
class.
- Input:
response
,query
,context
,expected_response
(whichever you specify in your prompt). - Type:
boolean
- Metrics:
passed
(0 or 1)
Example:
Evaluation Inputs:
- eval_prompt: “Think step-by-step. Based on the provided user
query
and refund policy, determine if theresponse
adheres to the refund policy. Userquery
: Refund policy:response
: “ query
: “How many vacation days are we allowed?”context
: “Employees are allowed 15 holidays per year, and 5 days of paid leave.”response
: “Employees are allowed 20 vacation days per year, and 5 days of paid leave.”
Evaluation Results:
- result: Fail
- explanation: The
response
does not adhere to the refund policy provided. The refund policy is that employees are allowed 15 holidays per year, and 5 days of paid leave.
Demo Video
How to use it in the SDK
Simply use the CustomPrompt
class and specify your own eval_prompt
.
Note: Any variables you use in the prompt (for example: query
, context
,
response
) will be interpolated from your dataset
.