Datasets
Guides
Prototyping & Experimentation
- Comparing different models and prompts
- Comparing different datasets side-by-side
- Prototyping a prompt chain in 3 mins without writing code
Evaluation
- RAG Evaluation: A Guide
- Measure and Improve retrieval in your RAGs
- LLM-as-a-Judge Evaluation
- Pairwise Evaluation