by Jacob Plicque
Being On-Call doesn’t have to Suck. How can we do better?
Companies rely on key services to be available nearly 100 percent of the time in order to make revenue. A consequence of this situation is that it has become natural for Engineers to get woken up late at night or early in the morning to resolve incidents. But whether you rise to the occasion or not, it eventually becomes a very taxing experience. Unfortunately, our industry has accepted this as the norm. There is a better way. Chaos Engineering. In this session, we will explore how we got to this point and how we can adopt Chaos Engineering to help us wake up less and sleep better. What will attendees Learn : How did we get here? The State of On-Call in the Tech Industry using my experience in both Cloud Ops and SRE Operational Maturity: Why we need to revisit this definition and the different stages Companies find themselves in Why being an On-Call Hero seems exhilarating at first but then pretty terrible Why are we so reactive? How do we move the needle to be proactive? It’s not just reproducing Incidents or Playbook Validation, how can CE help us get more sleep? Explaining the Force Function that Chaos engineering creates that causes us to ask questions about our systems and just as if not more important, our people. Where do we go from here? What to do if you’re on-call this sprint. What’s next?