Compiler Design (1)
Why Compilers Are Inherently Cool
Last Edited: Jan 19, 2025
At first glance, compilers may not seem very compilcated. After all, they are just programs that translate your C code (or another language, although some languages use an interpreter instead of a compiler) into an assembly language file, and then machine code (0s and 1s). A compiler serves as a translator, but it is not as if you are translating from English to Mandarin. C itself is already a programming language, just not binary. Unfortunately, compilers are much more complicated than this! In my computer systems class, we spent the first few months learning about how the x86-64 instruction set works1. The assembly language is considered low-level because assembly code can be directly translated into machine code — it's a one-to-one match. All the cool stuff in C (and most other high-level programming languages), like data types, variable names, multidimensional arrays, and for loops disappear in assembly. For instance, variables don't have predefined data types in assembly. Consequently, things are interpreted quite flexibly; you read the last four bytes when the variable is an integer, eight when it is a double, and so on. All the for loops get wiped out, and instead you see a bunch of jump statements2. Half the questions on my final exam involved translating C code into assembly, and for compilers to do this based on just a piece of C code, and no AI assistance, is a daunting task. It certainly sounds like a bunch of if-else statements, but compilers employ a crazy level of optimization in the process on top of everything. If there is enough interest, I can give a more solid walkthrough of how C to assembly works. Regardless, you should get the overall feel by now.
Wikipedia tells me that the overall compiling process goes through three phases: the front, the middle, and back ends. Over the next few days/weeks, I will be exploring each phase in detail. More specifically:
- Front end: lexical analysis, syntax analysis, and semantic analysis
- Middle end: analysis and optimization
- Final end: code generation
1 x86-64 is a kind of instruction set architecture, which defines how software controls the CPU - a
combination of instructions, registers, managing memory, etc
2 Jump statements are the equivalent of goto statements in C. Even in C, goto statements are not
recommended because of how messy your code gets with nested loops.