Runbooks
Automate operational responses with safety guardrails and approval gates for high-impact actions.
Triggers
| Trigger | Fires when |
|---|---|
| Alert | A specific alert fires |
| Anomaly | AIOps detects an anomaly |
| Incident | An incident is created |
| Schedule | A cron expression matches |
| Webhook | An external POST is received |
| Manual | Run manually from the UI |
Actions
| Action | What it does |
|---|---|
| Scale | Scale deployment replicas |
| Restart | Rolling restart of pods |
| Notify | Send a notification to a channel |
| Exec | Run a command in a pod |
| Patch | Apply a patch to a K8s resource |
| Drain | Drain a node |
| Cordon | Mark a node unschedulable |
| Custom | Run a custom script |
Safety guardrails & approval gates
- Max executions per hour (default: 5) and cooldown between runs.
- Scope limited to specific namespaces or clusters.
- Dry-run mode to verify without applying changes.
- High-impact actions (drain, scale >3x, exec in production) pause for admin/owner approval.
Execution history
A timeline of every execution with status (success, failed, pending approval, cancelled), detailed step logs, duration, and affected resources.