Generate a Synthetic Dataset

Demo Video
How to Generate a Synthetic Dataset
What type of synthetic data can I generate?
How does synthetic data generation work?

This feature is currently in beta. Please contact us if you’d like early access.

AI is only as good as your data. But collecting robust datasets for training and testing can a major challenge. In Athina IDE, you can generate high-quality RAG Q&A datasets using your own documents on Athina. You can then use this generated data for evaluation, testing prompts and models, running experiments or export for fine-tuning.

Demo Video

How to Generate a Synthetic Dataset

Open Athina Develop.
Click Generate Synthetic Data
Select the documents you want to use to generate the dataset. a. You can either upload a .txt file b. Or you can choose to generate synthetic data similar to your production logs.
Choose the number of questions you want to generate.
Choose the type of questions you want to generate.

What type of synthetic data can I generate?

Currently, we support the following question types:

Simple Q&A
Reasoning-based Questions
Multiple Choice Questions
Negative Questions
Unsafe Questions
Conditional Questions

If you need something more custom than this, please contact us.

How does synthetic data generation work?

We partnered with Fiddlecube to leverage their advanced data generation techniques. A lot things are happening under the hood to generate high quality data:

The source data is run through a data generation pipeline, which uses large language models to generate rows with diversity.
The dataset is then measured for quality, and rigorously filtered, cleaned and de-duped to meet the described criteria.
Ultimately, the output rows will be RAG Question-Answer style rows with a query, context, and response.

Import a HuggingFace Dataset Overview

Getting Started

Datasets

Evals

Flows

Annotation

Prompts

Monitoring

Settings

Integrations

Self Hosting

Datasets

Generate a Synthetic Dataset

Demo Video

How to Generate a Synthetic Dataset

What type of synthetic data can I generate?

How does synthetic data generation work?

Getting Started

Datasets

Evals

Flows

Annotation

Prompts

Monitoring

Settings

Integrations

Self Hosting

Datasets

​Demo Video

​How to Generate a Synthetic Dataset

​What type of synthetic data can I generate?

​How does synthetic data generation work?

Demo Video

How to Generate a Synthetic Dataset

What type of synthetic data can I generate?

How does synthetic data generation work?