2024-11-07

Plan and PRACTICE for better incident response with insights from Tim Armandpour, CTO of PagerDuty. Learn the secrets to resilience from the team that mitigated the impact of a major outage—handling a 250% traffic surge while delivering on their SLA. Listen to find out:

  • 🛠️ Why planning AND practice are both critical for incident response.
  • 🚧 How to practice for incident response (e.g Failure Fridays with Chaos Engineering)
  • 🧑‍🤝‍🧑 Ownership: Why tech AND business teams must join post-mortems.
  • ☁️ How to mitigate the impact of your cloud provider’s lower SLA.
  • ⚓ What architecture patterns are more resilient? 
  • ⚖️ WARNING: “bend” the CAP theorem at your own risk
« back to Podcasts