Principle III: Evaluation-First
Source:
.specify/memory/constitution.md
Overview
Enterprise agents must be continuously evaluated against business objectives and risk metrics throughout their lifecycle. Unlike traditional software where code correctness predicts success, agent quality depends on systematic measurement of behavior and business outcomes.
The paradigm shifts from code-first to evaluation-first: continuous measurement, behavioral validation, and evidence-based deployment ensure agents remain aligned with organizational goals.
Non-Negotiable Rules
| Rule | Description |
|---|---|
| KPI-First | Define KPIs before development begins |
| Automated Evaluation | LLM-as-a-Judge combined with human review in CI/CD pipelines |
| Code Coverage | Comprehensive test coverage for all code paths |
| Behavior Quality | High-bar behavioral validation (accuracy, task success, groundedness) |
| Champion-Challenger | Statistical evaluation before production promotion |
| Evidence Required | Eval results, red team reports, and compliance checklists ship with code |
Evaluation Flow
Enterprise Feature
Testing tier configurations, evaluation framework details, quality gate automation rules, and checkpoint evidence requirements are available to enterprise consumers. Contact us for access.
Related Agents
- qa-engineer — Primary agent for test orchestration
- observability-engineer — Metrics and drift detection