Principle V: Observability & Resilience
Source:
.specify/memory/constitution.md
Overview
Enterprise agents must be continuously observed and managed to maintain reliability, performance, and operational excellence. Agents are adaptive, non-deterministic systems that require fundamentally different observability than traditional applications.
The paradigm shifts from "is it up?" to "is it right?" — where incorrect, biased, or hallucinated outputs pose operational and security risks even when systems are technically performant.
Non-Negotiable Rules
| Rule | Description |
|---|---|
| MELT Telemetry | Complete coverage of Metrics, Events, Logs, and Traces |
| Agent Observability | Reasoning traces, tool calls, token usage, and cost tracking |
| Real-time Monitoring | Quality, safety, and operations metrics dashboards |
| Drift Detection | Anomaly identification with automated alerting |
| SLOs + Error Budgets | Defined service level objectives with incident runbooks |
| Root Cause Analysis | Correlate failures to prompts, tools, and models |
MELT Framework
PDCA Cycle Tracking
The enforce-pdca-cycle.sh hook tracks Plan-Do-Check-Act iterations per specialist agent per session. This prevents infinite retry loops and ensures HITL escalation when an agent cannot converge.
| Property | Value |
|---|---|
| Default cycle limit | 7 (ADLC_MAX_PDCA_CYCLES) |
| State file | tmp/<project>/pdca-cycles/<agent>-YYYY-MM-DD.json |
| Escalation | HITL warning via stderr at cycle limit |
| Scope | Per specialist, per day (coordination agents excluded) |
Evidence Audit Trail
The log-coordination-wrapper.sh hook auto-logs ALL agent completions (both coordination and specialist) to structured JSON files. This provides a complete audit trail of every agent invocation.
| Agent Class | Agreement Score | Log Path |
|---|---|---|
| Coordination (PO, CA) | 97% (design-level) | coordination-logs/<agent>-YYYY-MM-DD.json |
| Specialist | 100% (binary done/not) | coordination-logs/<agent>-YYYY-MM-DD.json |
See Hook Enforcement Reference for the complete log schema and PDCA state machine details.
Related Agents
- observability-engineer — Primary agent for MELT implementation
- qa-engineer — Performance test metrics