Why Do We Need to Evaluate Conversations?
- Enhancing User Experience: By analyzing interactions, developers can identify areas where conversations may cause user frustration or confusion. This insight allows for adjustments that make dialogues more intuitive and engaging, leading to higher user satisfaction.
- Ensuring Accuracy and Reliability: Regular evaluation helps assess the ability of conversations to provide correct and relevant information. This is particularly crucial in sensitive domains like healthcare or finance, where misinformation can have serious consequences.
- Maintaining Ethical Standards: Continuous assessment ensures that conversations adhere to ethical guidelines, avoiding inappropriate or harmful exchanges. This is vital to prevent scenarios where discussions might inadvertently suggest dangerous actions or provide misleading advice.
- Improving Conversational Abilities: Evaluations can reveal limitations in natural language flow, allowing for targeted improvements. This leads to more natural and effective communication, enhancing the overall impact of the conversation.
Evaluate Conversations in Datasets
Here we are using the TinyPixel/multiturn dataset from hugging face and we will create a Conversation Completeness custom eval to evaluate the conversation.You can also use preset templates such as conversation coherence and safety evals such as Harmfulness and Maliciousness etc. to evaluate your conversation.
Step 1: Create custom Evals
1
First, click on the Evaluate, then select Create New Evaluation and choose the Custom Eval option.

2
Then click on the custom prompt option as you can see below:

3
After that add the Conversation Completeness evaluation prompt just like the following example.

4
Similarly, add the Grammar Accuracy evaluation prompt:
Step 2: Run the Evaluation
1
Then, run the evaluations to check whether the conversation is complete or incomplete and to assess grammar accuracy based on the defined criteria.

2
After the evaluations are complete, go to the Metrics section to review the results.
