Resilient Distributed Systems
Distributed systems exist in all shapes and sizes, and chances are you're either working on one now. But the fundamental nature of distributed systems means that we are exposed to all sorts of problems that can cause our systems to become slow, unresponsive, or even fail altogether.
In this two-day class, you'll learn about the challenges of distributed systems, and what you can do to create software that is much more likely to work when users need it to.
Whether you're building a fine-grained microservice architecture, or a more monolithic distributed system, you'll learn concrete practices to help make your software more robust.
This is not a coding masterclass, but it is interactive. This is a class about concrete practice, but presented in a technology agnostic way.
What We Will Cover
Each class is different, with the attendees shaping what content we will cover in each class, but expect to cover:
- What is resiliency?
- The Golden Rules of distributed systems
- AI and resiliency
- Timeouts
- Retires & Idempotency
- Load shedding & back pressure
- Circuit breakers
- Bulkheads
- SLAs, SLOs and error budgets
- Human factors
- CAP & PACELC Theorems

Sam Newman is interested in technology at the intersection of things, from development, to ops, to security, usability and organisations. After over a decade at ThoughtWorks he is now an independent consultant. Sam is the author of "Building Microservices" from O'Reilly. He has worked with a variety of companies in multiple domains around the world, often with one foot in the developer world, and another in the IT operations space. If you asked him what he does, he’d say ‘I work with people to build better software systems’. He has written articles, presented at conferences, and sporadically commits to open source projects. While Java used to be his bread and butter, he also spends time with Ruby, Python, Javascript, and Clojure, Infrastructure Automation and Cloud systems.
