Fastware: The Art of Optimizing C++ Code


This class introduces attendees to a thorough approach to optimisation techniques for contemporary computing architectures. It is based on material from Andrei's upcoming book Fastware. In turn, the book is based on Andrei's career-long experience with tuning the performance of various software systems, from Machine Learning research to high-performance libraries to Facebook-scale computing backends.

Such information is scant and difficult to find. Software engineering folklore is rife with tales of optimisations. Programmers commonly discuss and argue whether a piece of code is supposed to be faster than another, or what to do to improve the performance of a system small or large.

Optimisation is big. Arguably it's bigger today, when serial executi- on speed has stalled and, after parallelizing what's possible, we have single-thread speed as the remaining bottleneck. A large category of applications have no boundaries on desired speed, meaning there's no point of diminishing returns in making code faster. Better speed means less power consumed for the same work, more workload with the same data center expense, better features for the end user, more features for machine learning, better analy- tics, and more.

Optimizing has always been an art, and in particular optimizing C++ on contemporary hardware has become a task of formidable comple- xity. This is because modern hardware has a few peculiarities about it that are not sufficiently understood and explored. This class offers a thorough dive in this fascinating world.

Intended audience

This is aimed at C++ programmers who have efficiency of generated code as a primary concern.


The format is a highly interactive lecture. Questions during the lecture are encouraged. Use of laptops for trying out examples is allowed.


09:00 - 16:15


The Art of Benchmarking
Conducting Time Measurements
Strength Reduction
Minimizing Indirections
Eager Computation: Tables vs. Computation

Lazy Computation


Computation vs. Tables

Lazy Structuring

› Instruction-Level Parallelism

› Inlining
› Smart Resource Optimizations

› Copy Elision

› Scalable Use of the STL
› Building Structure on Top of Arrays
› Large Set Operations and Derivatives

› Contention Minimization