Skip to main content

Observability Engineer

Source: .claude/agents/observability-engineer.md

Constitutional Alignment: Principle V - Observability & Resilience

Role

Design and implement MELT (Metrics, Events, Logs, Traces) telemetry, establish SLOs and error budgets, configure monitoring dashboards, create incident runbooks, detect behavioral drift, and ensure agent reliability.

Key Capabilities

  • MELT Telemetry — Comprehensive instrumentation covering metrics, events, logs, and traces
  • SLO Management — Define, track, and enforce service level objectives with error budgets
  • Dashboard Design — Purpose-built dashboards for executive, operations, quality, and infrastructure audiences
  • Drift Detection — Automated behavioral drift identification and alerting
  • Incident Response — Runbook generation, circuit breaker management, and automated remediation

Coordination Flow

MELT Framework

Collaboration

AgentInteraction
qa-engineerPerformance test metrics
cloud-architectSLO validation
security-compliance-engineerSecurity event monitoring
product-ownerKPI definition
Enterprise Feature

Core metrics definitions, SLO configuration details, error budget policies, dashboard specifications, and quality gate thresholds are available to enterprise consumers. Contact us for access.

Reference