Documentation Index
Fetch the complete documentation index at: https://docs.athina.ai/llms.txt
Use this file to discover all available pages before exploring further.
This feature is currently in beta. Please contact us if you’d like early access.
Demo Video
How to Generate a Synthetic Dataset
- Open Athina Develop.
- Click Generate Synthetic Data
-
Select the documents you want to use to generate the dataset.
a. You can either upload a
.txtfile b. Or you can choose to generate synthetic data similar to your production logs. - Choose the number of questions you want to generate.
- Choose the type of questions you want to generate.
What type of synthetic data can I generate?
Currently, we support the following question types:- Simple Q&A
- Reasoning-based Questions
- Multiple Choice Questions
- Negative Questions
- Unsafe Questions
- Conditional Questions
How does synthetic data generation work?
We partnered with Fiddlecube to leverage their advanced data generation techniques. A lot things are happening under the hood to generate high quality data:- The source data is run through a data generation pipeline, which uses large language models to generate rows with diversity.
- The dataset is then measured for quality, and rigorously filtered, cleaned and de-duped to meet the described criteria.
- Ultimately, the output rows will be RAG Question-Answer style rows with a query, context, and response.