Evaluation
ragway includes built-in evaluation modules to benchmark pipeline quality.
Metrics
- Faithfulness
- Answer accuracy
- Context recall
- Context precision
- Hallucination score
- Latency
CLI evaluation
rag evaluate --dataset eval.json --config rag.yamlProgrammatic evaluation
from ragway.evaluation.faithfulness import FaithfulnessEval
# Evaluate generated answer and retrieved context against a question.Use the same dataset across pipeline variants to compare quality and cost trade-offs.