Many new programmers have little to no exposure to assembly code. Over the years, colleges have started glossing over the details of how processors execute instructions and how data is stored in memory. On top of that, new high-level programming languages are created every year. In fact, high-level languages like Python and Java have surpassed “low-level” languages like C and C++ in popularity.
As a result, many young programmers cower in fear at the mere mention of assembly: “Silence! We do not speak the ‘A’ word here!”
When they step into disassembly, this is what they see:
I, too, was afraid of assembly for a long time. I took an assembly programming course in college, but it was taught while I was still trying to understand C++, so I barely learned enough to pass the class. After graduating, I was overwhelmed by the transition from college programming to professional programming (where are all the stub functions I’m supposed to fill in?). Given everything I had to learn, I was more than happy to continue coding without the slightest understanding of assembly.
However, in retrospect, I was doing myself a disservice. Assembly programming is the lowest level of human-readable instruction before the CPU starts sending bits over the wire, and there are plenty of benefits to understanding how your high-level code is translated to assembly. However, I think the most important reason to learn assembly is that it’s fun!
“Fun!?” you say. “How could it possibly be fun?” Well, let me explain…
There Is No Spoon
I don’t think programming in assembly is fun. If it was, we wouldn’t keep creating new high-level languages that abstract away low-level details. High-level languages are much more expressive and powerful than assembly, and allow programmers to create more functionality in less time.
However, understanding assembly makes coding in your high-level language more fun. There’s something deeply satisfying about peeking under the hood and realizing that all of your complex code translates to a simple set of instructions. Variables, loops, functions, classes – these are all just abstractions that the CPU doesn’t know about. Understanding assembly helps you look past the complexity of high-level languages and understand what’s really going on.
So, now that we agree that understanding assembly is awesome, let’s cover some of the basics of assembly.
CPUs read values from and store values to registers, which are just small pieces of memory on the CPU. CPUs can also read and write values from main memory (i.e. RAM), but that’s much slower, so they prefer to work with values in registers. Registers are referred to by name in assembly. In x86 assembly (the assembly language I’m covering here), they have names like
There are 8 registers that your code can directly manipulate. Some of these registers, like
ecx, are general-purpose and are used to hold variables and intermediary results. Others are used to hold very specific values.
In addition to the registers your code directly updates, there’s one very important register that the CPU controls directly: the
eip is known as the “instruction pointer”, and it holds the address of the instruction that the CPU is going to execute next. The CPU automatically increments the value in
eip after it executes an instruction.
So assembly works with data in registers, but how do you tell it what to do with that data? Well, that’s expressed using mnemonics. A mnemonic is just a human-readable name for an underlying CPU instruction. Some examples of this are
add for the add instruction,
sub for the subtract instruction, and
mul for multiply instruction.
Each assembly instruction takes zero, one, or two parameters. These parameters are either a register, a memory address, or an “immediate” (i.e. hard-coded) value. The parameters to an assembly instruction are just listed after the mnemonic, similar to how the parameters to a C++ function are listed in parenthesis after the function name.
A basic example of this is the
add eax, 1
We can see that the
add instruction takes two parameters. In this case, the first parameter is the register
eax, and the second parameter is the immediate value “1”. The two parameters are added together and the result is stored back in the first parameter, so the above assembly could be written in C++ as:
eax = eax + 1;
When you learned C++, you probably heard that local variables are stored on the “stack”, as opposed to objects that you create with the
new operator, which are stored on the “heap”. Storing variables on the stack is a fundamental part of assembly programming, let’s take a look at how it works.
Remember how I said that some registers are general-purpose, while others are used for a specific purpose? Two special-purpose registers are
esp register holds the address of the top of the stack, which is why it’s known as the “stack pointer”. Every time you create a local variable, its value is stored at the address specified by
esp, and then the value in
esp is decremented by the size of the variable. The value in
esp is decremented rather than incremented because the stack grows from high memory addresses to low memory addresses. This matches how we think of the stack data structure: new values get pushed on top.
You push and pop values from the stack using the
push takes one parameter: the value to be pushed onto the stack.
pop also takes one parameter: the register or memory address to store the value that’s popped off of the stack.
So, for example, if our stack looked like this:
And then we executed this instruction:
Our stack would come out looking like this:
Notice that we don’t explicitly tell the processor to decrement
esp; it’s just implicitly done for us as part of the
ebp register is called the “base pointer”, and it points to the “base” of the current stack frame, which is just where the stack started for the current function. Unlike
esp, the value of
ebp needs to be set explicitly by our program.
That Wasn’t So Bad, Was It?
This concludes our whirlwind tour of the basics of assembly. Next time, I’ll analyze how a simple C++ program is represented in assembly. In the meantime, try looking at some assembly yourself. The next time you’re debugging in Visual Studio, right click and select “Go To Disassembly”. With a little time and effort, the black magic of assembly code will start to make sense, revealing the secrets of programming to you.