What is Compilation?
I keep hearing about “compiling” things. When I wrote code in Python, I never had to do that. What's the deal?
Actually, there are both compiled languages and interpreted ones. Python is an interpreted language.
Let's take a look at what compilation is!
Compilation (i.e., the process of compiling a program) refers to the process of translating a program from a high-level human-readable format (like C or C++ source code) to a low-level machine-executable format (like microprocessor instructions). Because processors are implemented using actual physical hardware, and need to run as quickly as possible, machine processor instructions are necessarily much simpler than many of the statements we write in C++ programs.
A compiler is a program that performs compilation. In CS70, the compiler we will use is a system called Clang (which is part of the open-source LLVM project), and the C++ compiler is called clang++
.
Is there just one kind of machine code?
No, different processor families speak different machine languages, so the program needs to be translated specifically for the processor family you want to run it on.
Some popular processor families are:
- AMD 64 (a.k.a. Intel or x86_64) — Used in most modern PCs.
- ARM 64 (a.k.a. Apple Silicon) — Used in newer Macs, smart phones, Raspberry Pi computers.
- Power PC 64 — Used in some mainframes.
- RISC V
Compiling down to the bare-metal machine instructions has to mean programs are super fast as a result!
Actually, today's computers are so fast, it often doesn't matter whether a program runs as fast as possible. If a program takes only a second in Python and 0.01 seconds in C++, will anyone care? Often it doesn't matter that (interpreted) Python is often more than 10–100× slower than compiled languages like C++, especially if it takes you five minutes to write the python code and 45 minutes to write the C++!
But compiling has some other advantages. The translation process means that a lot of things about your program need to be figured out before it's run. Python's interpreter figures out most things as its running your program. So you find some kinds of errors sooner.
Video: Examining the Stages of Compilation
While we sometimes talk about compiling like it's a single process, there are actually several steps that our code goes through on its way to becoming an executable program that can be run on a computer.
This video provides background on what's going on in these stages. The goal is not for your to learn all the details it shows, but to get a sense of the background so that these concepts don't seem so alien.
Summary
Input(s) | Specifically | Output | Transformed by |
---|---|---|---|
Source File | (C++ Source, .cpp and .hpp ) |
Edited C++ Source | Preprocessor |
Preprocessor Output | (Edited C++ Source, .ii ) |
Assembly Language | Compiler |
Assembly Language | (AMD64 or ARM64 code, .s ) |
Object File | Assembler |
Object File(s) & Libraries | (.o , .a and .so files) |
Executable | Linker |
It seems complicated to have so many stages.
In some ways yes, but it also means that individual stages are simpler and focus on just one thing.
It's also the case that back in the 1970s, when C was invented, the computers that ran Unix didn't have very much memory, so breaking the task up into smaller chunks made it manageable for machines at the time
The way C++ works today on most systems comes from the long history of C and C++ (and the history of Unix, too, which dates back to the early 1970s as well).
(When logged in, completion status appears here.)