Engineering Incident Response
Run an incident response workflow — triage, communicate, and write postmortem. Trigger with "we have an incident", "production is down", an alert that needs severity assessment, a status update mid-incident, or when writing a blameless postmortem after resolution.
What this skill does
Manage any production crisis from initial alert to final resolution with a structured workflow that handles triage, communication, and reporting. Instantly assess severity, draft clear status updates for customers and the team, and generate a blameless postmortem document once things are fixed. Reach for this whenever systems go down or you need to coordinate a response without panic.
name: incident-response
description: Run an incident response workflow — triage, communicate, and write postmortem. Trigger with “we have an incident”, “production is down”, an alert that needs severity assessment, a status update mid-incident, or when writing a blameless postmortem after resolution.
argument-hint: ""
/incident-response
If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md.
Manage an incident from detection through postmortem.
Usage
/incident-response $ARGUMENTS
Modes
/incident-response new [description] # Start a new incident
/incident-response update [status] # Post a status update
/incident-response postmortem # Generate postmortem from incident data
If no mode is specified, ask what phase the incident is in.
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ INCIDENT RESPONSE │
├─────────────────────────────────────────────────────────────────┤
│ Phase 1: TRIAGE │
│ ✓ Assess severity (SEV1-4) │
│ ✓ Identify affected systems and users │
│ ✓ Assign roles (IC, comms, responders) │
│ │
│ Phase 2: COMMUNICATE │
│ ✓ Draft internal status update │
│ ✓ Draft customer communication (if needed) │
│ ✓ Set up war room and cadence │
│ │
│ Phase 3: MITIGATE │
│ ✓ Document mitigation steps taken │
│ ✓ Track timeline of events │
│ ✓ Confirm resolution │
│ │
│ Phase 4: POSTMORTEM │
│ ✓ Blameless postmortem document │
│ ✓ Timeline reconstruction │
│ ✓ Root cause analysis (5 whys) │
│ ✓ Action items with owners │
└─────────────────────────────────────────────────────────────────┘
Severity Classification
| Level | Criteria | Response Time |
|---|---|---|
| SEV1 | Service down, all users affected | Immediate, all-hands |
| SEV2 | Major feature degraded, many users affected | Within 15 min |
| SEV3 | Minor feature issue, some users affected | Within 1 hour |
| SEV4 | Cosmetic or low-impact issue | Next business day |
Communication Guidance
Provide clear, factual updates at regular cadence. Include: what’s happening, who’s affected, what we’re doing, when the next update is.
Output — Status Update
## Incident Update: [Title]
**Severity:** SEV[1-4] | **Status:** Investigating | Identified | Monitoring | Resolved
**Impact:** [Who/what is affected]
**Last Updated:** [Timestamp]
### Current Status
[What we know now]
### Actions Taken
- [Action 1]
- [Action 2]
### Next Steps
- [What's happening next and ETA]
### Timeline
| Time | Event |
|------|-------|
| [HH:MM] | [Event] |
Output — Postmortem
## Postmortem: [Incident Title]
**Date:** [Date] | **Duration:** [X hours] | **Severity:** SEV[X]
**Authors:** [Names] | **Status:** Draft
### Summary
[2-3 sentence plain-language summary]
### Impact
- [Users affected]
- [Duration of impact]
- [Business impact if quantifiable]
### Timeline
| Time (UTC) | Event |
|------------|-------|
| [HH:MM] | [Event] |
### Root Cause
[Detailed explanation of what caused the incident]
### 5 Whys
1. Why did [symptom]? → [Because...]
2. Why did [cause 1]? → [Because...]
3. Why did [cause 2]? → [Because...]
4. Why did [cause 3]? → [Because...]
5. Why did [cause 4]? → [Root cause]
### What Went Well
- [Things that worked]
### What Went Poorly
- [Things that didn't work]
### Action Items
| Action | Owner | Priority | Due Date |
|--------|-------|----------|----------|
| [Action] | [Person] | P0/P1/P2 | [Date] |
### Lessons Learned
[Key takeaways for the team]
If Connectors Available
If ~~monitoring is connected:
- Pull alert details and metrics
- Show graphs of affected metrics
If ~~incident management is connected:
- Create or update incident in PagerDuty/Opsgenie
- Page on-call responders
If ~~chat is connected:
- Post status updates to incident channel
- Create war room channel
Tips
- Start writing immediately — Don’t wait for complete information. Update as you learn more.
- Keep updates factual — What we know, what we’ve done, what’s next. No speculation.
- Postmortems are blameless — Focus on systems and processes, not individuals.
Install this Skill
Skills give your AI agent a consistent, structured approach to this task — better output than a one-off prompt.
npx skills add anthropics/knowledge-work-plugins --skill engineering Official Anthropic skill. Need a walkthrough? See the install guide →
Works with
No terminal needed — Claude.ai works by pasting the skill into custom instructions.
Details
- Category
- Development
- License
- Apache 2.0
- Author
- @anthropics
- Source
- GitHub →
- Source file
-
show path
engineering/skills/incident-response/SKILL.md