This feature is currently in beta. Please contact us if you’d like early access.
Demo Video
How to Generate a Synthetic Dataset
- Open Athina Develop.
- Click Generate Synthetic Data
-
Select the documents you want to use to generate the dataset.
a. You can either upload a
.txt
file b. Or you can choose to generate synthetic data similar to your production logs. - Choose the number of questions you want to generate.
- Choose the type of questions you want to generate.
What type of synthetic data can I generate?
Currently, we support the following question types:- Simple Q&A
- Reasoning-based Questions
- Multiple Choice Questions
- Negative Questions
- Unsafe Questions
- Conditional Questions
How does synthetic data generation work?
We partnered with Fiddlecube to leverage their advanced data generation techniques. A lot things are happening under the hood to generate high quality data:- The source data is run through a data generation pipeline, which uses large language models to generate rows with diversity.
- The dataset is then measured for quality, and rigorously filtered, cleaned and de-duped to meet the described criteria.
- Ultimately, the output rows will be RAG Question-Answer style rows with a query, context, and response.