Understanding Clang and LLVM Optimization Pipeline


Although both C, and C++ can be considered as slim and efficient abstractions over the assembly language, direct translation from these languages to the machine code generally underperforms. Optimizing compilers analyze the source code, and make transformations based on analysis results. These transformations replace certain suboptimal patterns with optimized equivalents, and finally generate efficient machine code that has better benefit-cost ratio than that of direct translation. Modern compilers are state of the art computer programs that exhibit the developments in computer science history. We will be walking through Clang and LLVM compilation pipeline, from parsing the source code to emitting optimized machine code. We will build both source code and IR level tools during this one-day workshop.

The following topics will be covered in this workshop. How, and why they work will be explained with actual examples in the form of C++ source code:

Clang front-end:
  • Journey of source code to Abstract Syntax Tree (AST)
  • Clang AST Actions, tooling and libclang

Clang CodeGen:
  • Language-specific optimizations
  • Undefined-behavior sanitizer
LLVM middle-end:
  • Introduction to LLVM IR
  • Analysis passes
  • Transformation passes (some highlights):
    • Combine redundant instructions (instcombine)
    • Scalar Replacement of Aggregates (SROA)
    • Loop Invariant Code Motion (licm)– Function Inlining (inline)
    • and more
  • Handling undefined-behavior cases
  • Sanitizers and instrumentation
  • Just-In-Time compilation (JIT)

Target audience:
C and/or C++ programmers who want to create and/or understand source code or IR level tools, implement a runtime with JIT support, and care about how their code/style affect the outcome. We will build simple tools around libclang, and LibTooling that operate on C or C++ source code. After understanding JIT, a runtime that can just-in-time compile a simple language will be built.

Computer Setup:
A computer (or access to a remote one) that can build clang from source code, or use prebuilt clang is necessary. FreeBSD, Linux or macOS is recommended.