- Compare the responses from different models or prompts side-by-side.
- Run evaluations on all datasets simultaneously.
- Download the results as a JSON, or Excel file.
Re-generate a dataset with a new prompt or a new model and compare the results side-by-side