You can also create custom evaluators. See here for
more information.
RAG Evals
These evals are useful for evaluating LLM applications with Retrieval Augmented Generation (RAG):RAGAS Evals
RAGAS is a popular library with state-of-the-art evaluation metrics for RAG models:- Context Precision
- Context Relevancy
- Context Recall
- Faithfulness
- Answer Relevancy
- Answer Semantic Similarity
- Answer Correctness
- Coherence
- Conciseness
- Maliciousness
- Harmfulness
Safety Evals
These evals are useful for evaluating LLM applications with safety in mind:- PII Detection: Will fail if PII is found in the text
- Prompt Injection: Will fail if any known Prompt Injection attack is found in the text
- OpenAI Content Moderation: Will fail if text is potentially harmful
- Guardrails: A popular library for custom validators for LLM applications:
- Safe for work: Checks if text has inappropriate/NSFW content
- Not gibberish: Checks if response contains gibberish
- Contains no sensitive topics: Checks for sensitive topics
Summarization Evals
These evals are useful for evaluating LLM-powered summarization performance:JSON Evals
These evals are useful for validating JSON outputs:Function Evals
Unlike the previous evaluators which used an LLM for grading, function evals use simple functions to check if:- Text matches a given regular expression
- Text contains a link
- Text contains keywords
- Text contains no invalid links
- Text is missing keywords