ITSM Lifecycle Golden Path
ITSM operations are distinct from product development. OPS board = customer incidents. SPM board = product stories. The pattern bridge connects them — after PII scrub.
At a Glance
Every OPS ticket flows through an 8-step pipeline: intake → discover → classify → cross-validate → decide → implement → operate → govern. Each step validates the ticket against real infrastructure and business policy. Selective execution is supported via --steps flag.
| I have a... | What happens | Command |
|---|---|---|
| New OPS ticket | Full 8-step processing with approval gates | /itsm:lifecycle OPS-NNN |
| Specific steps only | Execute selective subset (e.g., classify + cross-validate) | /itsm:lifecycle OPS-NNN --steps classify,cross-validate |
| Step 1: Intake | Auto-extract service type, environment, accounts, resource IDs | /itsm:lifecycle OPS-NNN --steps intake |
| Step 2: Discover | Inventory infrastructure with multi-account queries | /itsm:lifecycle OPS-NNN --steps discover |
| Step 3: Classify | Assign ticket type, priority, 6-prefix label taxonomy | /itsm:classify OPS-NNN |
| Step 4: Cross-Validate | Verify scope against 4 independent sources | /itsm:cross-validate OPS-NNN |
| Step 5: Decide | Assess change scheduling eligibility and blast radius | /itsm:lifecycle OPS-NNN --steps decide |
| Step 6: Implement | Create change record + CAB change request | /itsm:create-change OPS-NNN |
| Step 7: Operate | Configure monitoring, alarms, escalation paths | /itsm:lifecycle OPS-NNN --steps operate |
| Step 8: Govern | Cost impact summary, compliance evidence trail | /itsm:lifecycle OPS-NNN --steps govern |
| Resolved P0/P1 incident | Blameless post-incident review | /itsm:create-pir OPS-NNN |
All commands default to preview mode. Add --execute to apply changes to JIRA.
The Flow
Step 1: Intake
Who: sre-engineer auto-extracts. No manual input required.
What: Parse ticket description to identify service type, environment, affected accounts, and resource IDs.
Why: Consistent extraction before human judgment prevents classification errors downstream. Automation de-risks Stages A–C.
What-if skip: Incomplete service context leads to misdirected queries and incomplete blast radius assessment in Stage B.
Auto-Extraction Checklist
- Service type identified (compute, storage, network, identity, database)
- Environment tagged (prod, staging, dev, dr)
- Affected account IDs extracted
- Resource IDs isolated (instance IDs, ARNs, names)
CxO Perspective
- CFO: "What's the cost exposure?" — account metadata supports cost allocation
- CTO: "What systems are affected?" — service + environment scope
- CSO: "Any security implications?" — environment tagging flags compliance-scoped resources
Quality Gate
- All 4 checklist items populated
- Service type verified against runbooks CLI service catalog
- Account IDs cross-validated against Organization metadata
Step 2: Discover
Who: sre-engineer executes READONLY multi-account queries.
What: Execute inventory discovery commands for the identified service type using Config Aggregator (P1 org-wide path) or per-account profiles.
Why: Real resource state needed before classification (prevents TEMPLATE_NOT_EXECUTION anti-pattern). Decisions based on assumed state are wrong decisions.
What-if skip: Scope decisions made without seeing actual resource count, age, configuration drift, tagging status.
Discovery Scope
| Service | Discovery Tool | What It Returns |
|---|---|---|
| Compute | Config Aggregator + EC2 DescribeInstances | Instance count, age, tags, VPC placement |
| Storage | S3 ListBuckets + Config Aggregator | Bucket count, versioning status, lifecycle |
| Network | Config Aggregator + VPC DescribeFlowLogs | VPC CIDR, subnet count, NACLs, SGs |
| Database | Config Aggregator + RDS DescribeDBInstances | Instance type, engine version, backup status |
| Identity | IAM ListRoles + Organizations | Role count, last-used timestamp |
CxO Perspective
- CFO: "How many resources are in scope?" — cost per resource type informs change decision
- CTO: "Is this multi-account?" — multi-account changes trigger escalated CAB review
- CSO: "Are there compliance-scoped resources?" — audit-trail requirements per resource
Quality Gate
- Org-wide discovery attempted first (Config Aggregator query succeeds)
- If 0 results: cross-validated against per-account fallback
- Resource count documented (prevents CROSS_ACCOUNT_SILENT_ZERO)
- Discovery metadata attached to JIRA ticket
Step 3: Classification (/itsm:classify)
Who: sre-engineer classifies. HITL approves label set before JIRA update.
What: Decision tree assigns ticket type, priority, and 6-prefix label taxonomy based on Step 2 discovery data.
Why: Correct classification drives SLA, routing, and CAB review requirements. Misclassification propagates through downstream stages.
What-if skip: Tickets misrouted, SLAs misapplied, CAB reviews missed for Changes that require them.
Decision Tree
Ticket received
├── Unplanned disruption to service → Incident
│ └── Underlying cause unknown → Problem (linked)
├── Planned modification to production → Change
│ ├── Standard (pre-approved template) → No CAB required
│ ├── Normal (requires CAB review) → Stage C + D
│ └── Emergency (P0/P1 impact) → Expedited CAB
└── User-requested provisioning → Service Request
└── Fulfillment via service catalog
6-Prefix Label Taxonomy
| Prefix | Values | Purpose |
|---|---|---|
blast: | single-service, single-account, multi-account, org-wide | Blast radius for impact scoring |
rca: | code-defect, config-drift, infra-failure, human-error, 3rd-party | Root cause category |
ke: | known-error-{id} | Known error database reference |
service: | cloudops, finops, network, identity, compute | Affected service domain |
env: | prod, staging, dev, dr | Environment scoping |
tier: | critical, important, best-effort | Business impact tier (not tier-1/2/3) |
Priority Matrix
| Low Urgency | High Urgency | |
|---|---|---|
| High Impact | P2 (4h SLA) | P0 (15min SLA) |
| Low Impact | P4 (5d SLA) | P3 (8h SLA) |
P1 = High Impact, High Urgency, not platform-wide (P0 threshold).
Quality Gate
- Ticket type assigned (Incident / Problem / Change / Service Request)
- All 6 label prefixes populated
- Priority P0-P4 set based on Impact × Urgency matrix
- HITL approves label set before
jira_update_issuecall
Step 4: Cross-Validation (/itsm:cross-validate)
Who: sre-engineer gathers evidence. qa-engineer validates accuracy.
What: 4 independent sources corroborate the incident report before any change record is created.
Why: 99.5% accuracy gate. SELF_COMPARISON_VALIDATION prevented — sources must be independent, not same-process exports from the same API call.
What-if skip: Change record based on incorrect scope, wrong systems in blast radius, PIR missing affected components.
4 Independent Sources
| Source | Tool | What It Provides |
|---|---|---|
| JIRA OPS | atlassian-tools MCP | Ticket description, comments, affected systems |
| runbooks CLI | /inventory:discover | Live resource inventory, cross-account |
| AWS API | READONLY profiles ($AWS_OPERATIONS_PROFILE) | CloudWatch alarms, Config events, flow logs |
| Visual | CloudWatch Console / dashboards | Timeline reconstruction, anomaly visualization |
5W1H Evidence Template
**What**: [Specific failure observed — service, endpoint, error message]
**Who**: [Affected users/systems — count, persona, account]
**When**: [Start time, detection time, escalation time — UTC ISO-8601]
**Where**: [Region, VPC, account ID reference, service domain]
**Why**: [Root cause hypothesis — rca: label maps here]
**How**: [Reproduce steps for engineering team]
This block is appended to the JIRA ticket description via ADF panel node (not blockquote — MCP ADF fidelity preserved via REST API v3 for structured nodes).
Quality Gate
- All 4 sources queried (no
CROSS_ACCOUNT_SILENT_ZERO— 0 results verified against account scope) - 5W1H evidence appended to JIRA ticket
- Reproduce prompt attached for engineering team
- Cross-validation delta ≤0.5% across independent sources
Step 5: Decide
Who: sre-engineer evaluates, 3-Party Golden Guideline gate applied.
What: Assess change scheduling eligibility, maintenance window recommendation, and blast radius impact.
Why: Formal go/no-go gate before implementation. Business, Security, and Engineering must align before resources are modified.
What-if skip: Changes proceed without approval; maintenance windows clash; blast radius underestimated; regulatory compliance unchecked.
3-Party Golden Guideline Gate
Change proceeds ONLY when ALL three parties approve:
| Party | Decision | Evidence |
|---|---|---|
| Business (CFO) | Cost/benefit acceptable | Cost impact analysis from Step 8 |
| Security (CSO) | Compliance trail complete | Audit evidence, encryption status |
| Engineering (CTO) | Rollback plan tested | Pre-staging validation completed |
CxO Perspective
- CFO: "What's the cost of this change?" — ongoing monitoring/alarms cost vs. maintenance cost
- CTO: "What's the rollback plan?" — confidence in revert timings, downtime risk
- CSO: "Does this pass security review?" — encryption, access control, audit logging
Quality Gate
- 3-Party gate PASS (all three approvers sign off)
- Maintenance window confirmed with HITL
- Blast radius quantified (instance count, service impact, user count)
- Rollback plan documented and tested
Step 6A: Implement — Change Management (/itsm:create-change)
Who: sre-engineer drafts. HITL approves before creation.
What: 7-section change record created as JIRA sub-task or linked issue under the incident.
Why: MSP cannot modify production environments without a formal change order. Audit trail required for APRA CPS 234.
What-if skip: Production changes executed without approval, audit gaps, regulatory non-compliance.
Stage C executes only for Change-type tickets (from Stage A classification). Incident and Service Request tickets skip to PIR.
7-Section Change Template
| Section | Content | Required |
|---|---|---|
| 1. Summary | One-line description of the change | Yes |
| 2. Impact Assessment | Systems affected, blast radius, user count | Yes |
| 3. Risk Rating | Low/Medium/High with justification | Yes |
| 4. Rollback Plan | Step-by-step revert procedure with time estimate | Yes |
| 5. Test Plan | Pre/post validation commands with expected outputs | Yes |
| 6. Implementation Steps | Numbered runbook with commands | Yes |
| 7. Approval | CAB reviewer list, approval date/time | Yes (CAB changes) |
HITL Gate
sre-engineer → drafts all 7 sections
→ attaches evidence from Stage B
→ presents draft to HITL
HITL → reviews rollback plan
→ approves or requests changes
→ executes: jira_create_issue (NOT agent-initiated)
Quality Gate
- All 7 sections populated (no placeholders)
- Rollback plan tested in non-prod before change window
- Implementation window confirmed with CAB (for Normal changes)
- Change ticket linked to originating Incident in JIRA
Step 6B: Implement — Change Request (/itsm:create-cr)
Who: sre-engineer creates CAB subtask. HITL schedules review.
What: CAB review subtask with implementation window, rollback plan, and test plan.
Why: Normal changes require explicit CAB approval with documented review. Emergency changes require expedited CAB with post-implementation review.
What-if skip: Changes executed without CAB oversight, audit trail incomplete, regulatory breach.
Stage D executes only when Stage A assigned label cr:normal or cr:emergency. Standard pre-approved changes skip Stage D.
CAB Subtask Fields
| Field | Normal Change | Emergency Change |
|---|---|---|
| Implementation Window | Scheduled (≥48h notice) | Expedited (HITL-approved window) |
| Rollback Plan | Full 7-section plan | Abbreviated (minimum 3 steps) |
| Test Plan | Pre+post validation | Post-implementation only |
| CAB Reviewers | 2 approvers minimum | 1 approver + post-review |
| Evidence Required | Stage B + Stage C | Stage B minimum |
Quality Gate
- CAB subtask linked to Change ticket (not standalone)
- Implementation window confirmed in JIRA
- At least 1 CAB approver assigned before window opens
- Emergency changes trigger post-implementation review within 24h
Step 7: Operate
Who: sre-engineer configures. HITL owns alarm thresholds.
What: Define CloudWatch alarms, exception escalation path, rollback triggers, and 30/60/90 day review schedule.
Why: Post-implementation monitoring catches regression early. Absence of alarms means failures go undetected.
What-if skip: Changes deployed without monitoring; production issues undetected for hours/days; MTTR inflates.
Monitoring Checklist
- CloudWatch alarms defined (threshold, SNS topic)
- Exception escalation path documented (PagerDuty routing, on-call owner)
- Rollback triggers identified (error rate > X%, latency > Y ms)
- Review schedule set (Day 1, Week 1, Month 1)
CxO Perspective
- CFO: "What's the monitoring cost?" — CloudWatch metrics, log ingestion, alarm evaluation
- CTO: "How will we detect regression?" — alarm coverage across all modified resources
- CSO: "Are audit logs enabled?" — CloudTrail for all Implement changes, Config rules for compliance
Quality Gate
- Alarms configured in CloudWatch (not placeholders)
- Escalation path integrated with PagerDuty/Slack
- Rollback trigger thresholds set based on SLA
- First review date added to JIRA ticket
Step 8: Govern
Who: sre-engineer drafts, HITL reviews, enterprise archives.
What: Cost impact summary (before/after), security posture delta, compliance evidence trail.
Why: Audit readiness and continuous improvement data. Without governance records, compliance audits fail.
What-if skip: No evidence trail, compliance gaps, inability to measure change impact, regulatory breach risk.
Governance Records
| Record Type | Source | Requirement |
|---|---|---|
| Cost Impact | Cost Explorer before/after | $ actual delta vs. estimated |
| Security Posture | Config rules + IAM changes | Policy changes, encryption status |
| Compliance Trail | CloudTrail, Config Conformance | Evidence of audit logging |
| Performance Delta | CloudWatch metrics | Latency, error rate change |
CxO Perspective
- CFO: "What was the actual cost impact?" — monthly spend comparison, ROI validation
- CTO: "Did we hit SLA?" — MTTR improvement, error rate reduction
- CSO: "Is the compliance trail complete?" — Config Conformance pack status, evidence linkage
Quality Gate
- Cost delta documented (actual vs. estimated)
- Security policy changes summarized
- Compliance evidence linked to JIRA ticket (CloudTrail, Config)
- HITL approves governance record before archival
Post-Resolution: PIR (/itsm:create-pir)
Who: sre-engineer drafts. HITL reviews and publishes to Confluence OPS.
What: Post-Incident Review with 5-Why RCA, timeline, and action items.
Why: Without PIR, incidents recur. Action items with owners and due dates prevent recurrence.
What-if skip: Incidents repeat, team learns nothing, MTTR stagnates, pattern bridge never fires.
PIR Structure
## Post-Incident Review: {Ticket ID}
**Incident**: {title}
**Duration**: {start} → {resolved} ({total minutes})
**Severity**: P{n} | Impact: {blast: label}
## Timeline
| Time (UTC) | Event | Source |
|------------|-------|--------|
| HH:MM | Detection | CloudWatch alarm |
| HH:MM | Escalation | PagerDuty |
| HH:MM | Mitigation applied | Change record |
| HH:MM | Service restored | Monitoring |
## 5-Why Root Cause Analysis
1. Why did the service fail? → {observation}
2. Why did {observation} occur? → {deeper cause}
3. Why did {deeper cause} exist? → {system cause}
4. Why wasn't {system cause} detected? → {monitoring gap}
5. Why wasn't the monitoring gap closed? → {process gap}
**Root Cause**: {concise statement}
**rca: label**: {rca:code-defect | config-drift | infra-failure | human-error | 3rd-party}
## Action Items
| Action | Owner | Due Date | JIRA Ticket |
|--------|-------|---------|-------------|
| Fix monitoring gap | CloudOps Engineer | {date} | OPS-NNN |
| Update runbook | SRE | {date} | OPS-NNN |
Quality Gate
- 5-Why analysis reaches process/system root cause (not stopping at symptom)
- All action items have owners and due dates
- PIR published to Confluence OPS space (not just JIRA)
- JIRA ticket transitioned to
Resolvedonly after PIR approved by HITL
Pattern Bridge: OPS to SPM
When 3 or more incidents share the same service: and rca: labels within a 90-day window, the pattern bridge triggers an auto-draft product story in JIRA SPM.
Why: Recurring incidents that exceed the MTTR budget have a root cause that belongs in the product backlog, not the operations queue.
Bridge Conditions
| Condition | Threshold | Action |
|---|---|---|
Same service: + rca: | 3+ incidents in 90 days | Auto-draft SPM story |
Same service: + P0 | 1 P0 incident | Immediate SPM story draft |
| MTTR > SLA × 3 | Any severity | Draft SRE capacity story |
PII Scrubbing
OPS tickets contain customer names, account IDs, and configuration data. Before creating an SPM story, replace:
- Account IDs with
{account_type}placeholders - Customer names with
{customer_type}or persona labels - IP/CIDR with network tier labels (
prod-vpc,dr-vpc)
Key: The bridge is intentionally one-directional. Operational incidents drive product improvements. Planned product changes follow the SDLC lifecycle via /sync:jira-push.
Regulatory Compliance
ITSM lifecycle steps meet ANZ FSI/Energy/Telecom regulatory requirements:
| Step | Compliance Area |
|---|---|
| Intake | Incident logging and timeliness |
| Discover | Asset inventory and cross-account scope |
| Classify | Asset identification and criticality assessment |
| Cross-Validate | Control testing and evidence gathering (99.5% accuracy gate) |
| Decide | Material change approval (3-Party gate) |
| Implement | Change management proportional to risk + CAB review |
| Operate | Monitoring controls and alert responsiveness |
| Govern | Audit trail completeness and evidence preservation |
| PIR | Systematic review and continuous improvement |
| Pattern Bridge | Recurring issue escalation to product backlog |
Detailed mapping (APRA CPS 234, CPS 230, NIST CSF 2.0) is documented in ADLC Governance Rules.
Component Map
| Component | Type | Purpose |
|---|---|---|
/itsm:lifecycle | Command | Full 8-step pipeline orchestrator |
/itsm:classify | Command | Steps 1–3: intake, discover, classify |
/itsm:cross-validate | Command | Step 4: 4-source evidence collection |
/itsm:create-change | Command | Step 6A: change record creation |
/itsm:create-cr | Command | Step 6B: CAB change request |
/itsm:create-pir | Command | Post-resolution: post-incident review |
sre-engineer | Agent | Executes all ITSM commands (haiku tier) |
itsm-lifecycle-steps | Skill | 8-step definitions: intake, discover, decide, operate, govern |
itsm-ticket-classification | Skill | Step 3: classification decision tree + 6-prefix taxonomy |
itsm-change-management | Skill | Steps 5–6: change workflow and risk assessment |
3-party-golden-guideline | Skill | Step 5: Business/Security/Engineering approval gate |
itsm-ticket-template | Skill | Standardized OPS + SPM ticket templates |
jira-jsm-service-desk | Skill | JSM lifecycle and SLA management |
atlassian-tools | MCP | 72 tools for JIRA + Confluence |
Common Mistakes (What NOT to Do)
| Mistake | Why It Fails | Fix |
|---|---|---|
TECHNICAL_WITHOUT_PROCESS | Changes without change order and approver | All infrastructure actions must include approval process |
AUTONOMOUS_SPRINT_ASSIGNMENT | Agent moves OPS tickets to sprint without HITL | Sprint assignment is HITL-exclusive action |
INCOMPLETE_EVIDENCE | Change records built on incomplete facts | All 4 sources verified before creating changes |
Worked Example: Instance Scheduler Expansion (OPS-180)
A customer emailed asking to extend automated overnight EC2 scheduling across more accounts. The ops team processed this through all 8 steps in a single sprint cycle.
In three sentences: The scheduler already runs in 8 accounts, stopping instances at 8pm and restarting at 8am Monday–Friday. Discovery revealed 117 instances across 17 accounts; the team validated scope against four independent sources before moving to CAB approval. Phase 1 will pilot one additional account after passing the 3-Party Golden Guideline gate.
How Each Step Played Out
Step 1 — Intake: Customer email arrived; a JIRA OPS ticket was created as type Change with minimal labels. Service type (compute) and environment (prod) extracted automatically.
Step 2 — Discover: Config Aggregator inventory query returned 117 instances across 17 accounts. Excel spreadsheet (customer's prior version) showed 110 instances across 16 accounts — discrepancy flagged for Step 4.
Step 3 — Classify: The ticket was classified as Change (not Task) because scheduler modification requires a change order. Labels added: service:ec2, tier:important, itsm-type:change, blast:multi-service, cr:normal.
Step 4 — Cross-Validate: Four independent queries were executed before any change planning began:
| Data Source | Result |
|---|---|
| Customer spreadsheet (Excel) | 110 instances across 16 accounts |
| AWS org-wide resource search | 117 instances across 17 accounts |
| AWS Config (env-tagged instances only) | 94 instances across 13 accounts |
| Billing data (EC2 Compute share) | ~1.88% of total org spend |
The gap between 110 and 117 reflects instances launched after the spreadsheet was last updated. The lower count from the env-tag query flags a tagging gap — two account environments use non-standard tag values and are invisible to the current scheduler rules.
Step 5 — Decide: The 3-Party Golden Guideline gate assessed:
- Business (CFO): Estimated savings $2,400/month (16 instances × $150/month = $2,400 avoided spend during off-hours). Cost of monitoring alarms: $180/month. ROI positive — approve.
- Security (CSO): No encryption changes, no policy modifications. Existing CloudTrail logging applies. Approve.
- Engineering (CTO): Scheduler is managed code; rollback is disable-cron + force-start. Tested in sandbox. Approve.
All three parties approved the change. Maintenance window: Tuesday 22:00–00:00 NZT (2-hour window, 50% of active instances in batch).
Step 6A — Create Change: The change record documented the AS-IS state (8 confirmed accounts, dev/preprod/sit tags, 8pm–8am NZT window) and three expansion candidates: sandbox, shared-services, datalake-dev. The tag gap on two non-standard environments was flagged as a prerequisite — those accounts cannot be added until tags are normalised.
Step 6B — CAB Review: The change was routed as a Normal change (cr:normal) with a 48-hour advance notice requirement. CAB conditions: the customer must submit an always-on exception list (batch jobs and compliance processes that must not stop) before the pilot window opens. Phase 1 covers one account only; Phase 2 requires a separate CAB update.
Step 7 — Operate (during pilot window): CloudWatch alarms configured for scheduler errors. Exception escalation routed to on-call SRE. Rollback trigger: >5 failed start operations in 1 hour. Review schedule: Day 1 (spot-check), Week 1 (full reconciliation), Month 1 (savings report).
Step 8 — Govern (after pilot week): Cost Explorer showed $287 actual savings (vs. $2,400 estimated for full month; pilot was 1 account). Discrepancy investigated: only 14 of 16 instances stopped successfully (2 had user connections that forced stop failure). Security audit: CloudTrail logged all scheduler actions. Compliance trail linked to JIRA ticket.
Customer Decision Gate
Before Phase 2 proceeds, the customer must answer:
- Which account should be the Phase 1 pilot?
- What workloads must never be stopped (always-on exception list)?
- Should the tag gap in the two non-standard environments be fixed, or should those accounts be excluded?
The CloudOps Manager owns the AWSO backlog ticket (due before the Phase 2 CAB update). No expansion beyond the pilot account occurs without customer confirmation at this gate.
Agent Team
| Agent | Role in This Path | Phase/Stage | Talent Bench |
|---|---|---|---|
| sre-engineer | 8-step ITSM execution + runbook orchestration | All phases (Intake→Govern) | Profile |
| security-compliance-engineer | Security posture assessment + compliance validation | Classify/Cross-validate | Profile |
| cloud-architect | Architecture review + risk assessment for mutations | Decide/Implement | Profile |
| product-owner | Ticket prioritization + business impact scoring | Intake/Target | Profile |
| qa-engineer | 4-source cross-validation scoring + evidence validation | Cross-validate | Profile |
7 Skills Coverage
| Skill | Coverage in This Path | Implementation |
|---|---|---|
| S1 System Design | 8-step pipeline architecture (Intake→Discover→Classify→Cross-validate→Decide→Implement→Operate→Govern) | ITSM lifecycle model, decision gates, escalation protocols |
| S2 Tool Design | JIRA MCP atlassian-tools + runbooks CLI strict schemas + AWS API contracts | Tool integration, schema validation, error message specificity |
| S3 Retrieval | 4-source cross-validation: Excel inventory, AWS API, runbooks CLI, JIRA OPS | Independent data sources, query diversity, accuracy targets ≥99.5% |
| S4 Reliability | API retry logic (3 attempts, exponential backoff) + fallback paths for circuit breaker scenarios | Pagination, timeout enforcement, error recovery per operational-efficiency.md Rule 6 |
| S5 Security | HITL gates before FIX/Implement mutations, READONLY profile restriction, change management process validation | Access control, approval workflow, audit trail (CloudTrail) |
| S6 Evaluation | 4-way cross-validation (Excel vs AWS vs CLI vs JIRA) + CxO-ready risk scoring | Evidence aggregation, persona-based filtering (CFO/CTO/CloudOps), accuracy measurement |
| S7 Product Thinking | Persona-specific reports (CFO: cost/risk, CTO: architecture/dependencies, CloudOps: runbook/automation), email templates for stakeholder updates | Report generation, audience segmentation, business impact framing |
Last Updated: April 2026 (8-step model) | Status: Active | Maintenance: sre-engineer