Principle III: Evaluation-First

Source: .claude/memory/constitution.md

Overview

Enterprise agents must be continuously evaluated against business objectives and risk metrics throughout their lifecycle. Unlike traditional software where code correctness predicts success, agent quality depends on systematic measurement of behavior and business outcomes.

The paradigm shifts from code-first to evaluation-first: continuous measurement, behavioral validation, and evidence-based deployment ensure agents remain aligned with organizational goals.

Non-Negotiable Rules

Rule	Description
KPI-First	Define KPIs before development begins
Automated Evaluation	LLM-as-a-Judge combined with human review in CI/CD pipelines
Code Coverage	Comprehensive test coverage for all code paths
Behavior Quality	High-bar behavioral validation (accuracy, task success, groundedness)
Champion-Challenger	Statistical evaluation before production promotion
Evidence Required	Eval results, red team reports, and compliance checklists ship with code

Evaluation Flow

Enterprise Feature

Testing tier configurations, evaluation framework details, quality gate automation rules, and checkpoint evidence requirements are available to enterprise consumers. Contact us for access.

qa-engineer — Primary agent for test orchestration
observability-engineer — Metrics and drift detection

Overview​

Non-Negotiable Rules​

Evaluation Flow​

Related Agents​

Reference​

Overview

Non-Negotiable Rules

Evaluation Flow

Related Agents

Reference