Fast Track to Chaos Engineering
Are you ready to proactively improve the reliability of your systems? In this workshop Russ Miles will dive into Chaos Engineering through system verification so that you can build confidence in your systems behaviour and identify weaknesses before they happen!
Chaos Engineering is a relatively new term for a practice that has been successfully applied by some of the largest and most complex production systems for some time. If you’re working with large-scale, complex systems (see: Microservices and distributed systems in general) then you will likely benefit from building confidence in your systems using the Chaos Engineering approach.
Chaos Engineering is an empirical discipline for defining experiments where the weaknesses of a complex, or even chaotic, system can be explored, discovered and eventually rectified. Most frequently practised in Production, Chaos Engineering helps you learn about your system so that it can be continuously improved in the face of current and future conditions.
What the attendee will learn
An attendee on this workshop will learn:
- The value and limitations of the chaos engineering discipline.
- How to establish a culture, practice, architecture, and design that is ready for chaos engineering.
- How to brainstorm and build a hypothesis backlog, how this then leads to Service Level Objectives and Service Level Indicators that are helpful to chaos engineering.
- How to plan Game days as a simple first step into chaos engineering.
- How to design, build, and execute careful and controlled system verifications based on chaos engineering experiments to surface weaknesses that may impact your system's SLOs.
- Apply different levels of experiments to learn about different types of weaknesses.
- How to customize your own chaos engineering toolbox to build advanced chaos engineering experiments.
- How to set up your chaos engineering experiments to support Continuous Verification of your system.
- How to share the results of chaos engineering to maximum effect with all system stakeholders.
Day 1 (Preparing for Chaos Engineering):
- What is chaos engineering and why do I need it?
- Chaos Engineering as System Verification.
- Building confidence in complicated and complex systems
- Understanding and working with the emergent and novel with Cynefin, chaos, and microservices
- Establishing prerequisites to chaos engineering
- Architecting and designing for chaos
- Chaos, Testing and Verification
- Building a hypothesis backlog
- Planning and Executing a Successful Game Day
Day 2 (Practicing Chaos Engineering):
- Building your first system verification.
- Running your first system verification.
- Understanding how chaos experiments implement system verification.
- Understanding the Chaos Toolkit and it's customizability.
- Customizing your experiment's steady-state hypothesis, method and rollbacks.
- Adding logging and monitoring to your experiments.
- Creating a custom extension that your experiments can use.
- Creating a custom control for your experiments.
- Enabling continuous verification.
- Building a sharing the results from your system verifications.
- Turning your system verification results into system improvement actions.
This workshop is for you if you are
- A hands-on engineer - You are a software developer, site reliability engineer, Operations, or DevOps.
- Anyone who cares about verifying the reliability of their systems!
Audience Requirements (Prerequisites)
Preferably attendees of this workshop will have:
- A working installation of the free and open source [Chaos Toolkit](https://docs.chaostoolkit.org/reference/usage/install/) (1.3.0+) on your own machine. Requires a working installation of Python 3.5+.
- A working installation of Minikube to run example applications on.
- A working internet connection (provided by the conference).
- A working knowledge of Python 3.5+ is optional but recommended as it will make the more advanced customization subjects easier to understand.
Russ Miles is CEO of ChaosIQ.io, where he and his team build commercial and open source products and provide services to companies applying chaos and resilience engineering to build confidence in the reliability of their production systems. Russ is an international consultant, trainer, speaker, and author. His most recent book, "Learning Chaos Engineering" by O'Reilly Media explores how to build trust and confidence in modern, complex systems by applying chaos engineering to surface weaknesses before they affect your users.