NEON Vectorization Workshop
Unleash the performance of your embedded ARM chip with NEON!
We introduce vectorization with NEON ground up from the most basic concept up to very advanced vectorization topics.
Workshop agenda:
- A short introduction to vectorization
- Introduction to NEON intrinsics
- Advanced NEON intrinsics
- Basic vectorization patterns – vectorizing for loops, foor loops with early exit, while loops and convergence loops
- Common vectorization patterns – vectorizing loops with conditions, conditional counting, loops with structs and matrix transposition.
- Vectorization inhibitors – learn to detect and remove obstacles that hinder efficient vectorization
- Vectorization types according to data access pattern – there are several ways to do vectorization, here we investigate inner-loop vectorization, outer-loop vectorization.
- Advanced vectorization patterns – we talk about how to vectorize copy_if, trees and lookup tables.
- Memory performance – improve the performance of your vectorized code by better using the memory subsystem.
- Peak performance – reach peak software performance by breaking instruction dependencies, avoiding register spills and cleverly using everything hardware has to offer.

Ivica Bogosavljevic
Application Performance Engineer at Johnny's Software LabIvica is a Senior Software Engineer with 15 years of experience active in the domain of Linux and bare-metal embedded systems. His professional focus is application performance improvement - techniques used to make your C/C++ program run faster by using better algorithms, better exploiting the underlying hardware, and better usage of the standard library, programming language, and the operating system. He is the writer for a performance-related tech blog: https://johnysswlab.com