Assembly is Fun! (Sort Of)

Many new programmers have little to no exposure to assembly code. Over the years, colleges have started glossing over the details of how processors execute instructions and how data is stored in memory. On top of that, new high-level programming languages are created every year. In fact, high-level languages like Python and Java have surpassed “low-level” languages like C and C++ in popularity.

As a result, many young programmers cower in fear at the mere mention of assembly: “Silence! We do not speak the ‘A’ word here!”

When they step into disassembly, this is what they see:

I, too, was afraid of assembly for a long time. I took an assembly programming course in college, but it was taught while I was still trying to understand C++, so I barely learned enough to pass the class. After graduating, I was overwhelmed by the transition from college programming to professional programming (where are all the stub functions I’m supposed to fill in?). Given everything I had to learn, I was more than happy to continue coding without the slightest understanding of assembly.

However, in retrospect, I was doing myself a disservice. Assembly programming is the lowest level of human-readable instruction before the CPU starts sending bits over the wire, and there are plenty of benefits to understanding how your high-level code is translated to assembly. However, I think the most important reason to learn assembly is that it’s fun!

“Fun!?” you say. “How could it possibly be fun?” Well, let me explain…

There Is No Spoon

I don’t think programming in assembly is fun. If it was, we wouldn’t keep creating new high-level languages that abstract away low-level details. High-level languages are much more expressive and powerful than assembly, and allow programmers to create more functionality in less time.

However, understanding assembly makes coding in your high-level language more fun. There’s something deeply satisfying about peeking under the hood and realizing that all of your complex code translates to a simple set of instructions. Variables, loops, functions, classes – these are all just abstractions that the CPU doesn’t know about. Understanding assembly helps you look past the complexity of high-level languages and understand what’s really going on.

So, now that we agree that understanding assembly is awesome, let’s cover some of the basics of assembly.

Registers

CPUs read values from and store values to registers, which are just small pieces of memory on the CPU. CPUs can also read and write values from main memory (i.e. RAM), but that’s much slower, so they prefer to work with values in registers. Registers are referred to by name in assembly. In x86 assembly (the assembly language I’m covering here), they have names like eax, ebx, ebp and esp.

There are 8 registers that your code can directly manipulate. Some of these registers, like eax, ebx, and ecx, are general-purpose and are used to hold variables and intermediary results. Others are used to hold very specific values.

In addition to the registers your code directly updates, there’s one very important register that the CPU controls directly: the eip register. eip is known as the “instruction pointer”, and it holds the address of the instruction that the CPU is going to execute next. The CPU automatically increments the value in eip after it executes an instruction.

Instructions

So assembly works with data in registers, but how do you tell it what to do with that data? Well, that’s expressed using mnemonics. A mnemonic is just a human-readable name for an underlying CPU instruction. Some examples of this are add for the add instruction, sub for the subtract instruction, and mul for multiply instruction.

Each assembly instruction takes zero, one, or two parameters. These parameters are either a register, a memory address, or an “immediate” (i.e. hard-coded) value. The parameters to an assembly instruction are just listed after the mnemonic, similar to how the parameters to a C++ function are listed in parenthesis after the function name.

A basic example of this is the add instruction:

add eax, 1

We can see that the add instruction takes two parameters. In this case, the first parameter is the register eax, and the second parameter is the immediate value “1”. The two parameters are added together and the result is stored back in the first parameter, so the above assembly could be written in C++ as:

eax = eax + 1;

Easy peasy!

The Stack

When you learned C++, you probably heard that local variables are stored on the “stack”, as opposed to objects that you create with the new operator, which are stored on the “heap”. Storing variables on the stack is a fundamental part of assembly programming, let’s take a look at how it works.

Remember how I said that some registers are general-purpose, while others are used for a specific purpose? Two special-purpose registers are esp and ebp.

The esp register holds the address of the top of the stack, which is why it’s known as the “stack pointer”. Every time you create a local variable, its value is stored at the address specified by esp, and then the value in esp is decremented by the size of the variable. The value in esp is decremented rather than incremented because the stack grows from high memory addresses to low memory addresses. This matches how we think of the stack data structure: new values get pushed on top.

You push and pop values from the stack using the push and pop instructions. push takes one parameter: the value to be pushed onto the stack. pop also takes one parameter: the register or memory address to store the value that’s popped off of the stack.

So, for example, if our stack looked like this:

And then we executed this instruction:

push ebx

Our stack would come out looking like this:

Notice that we don’t explicitly tell the processor to decrement esp; it’s just implicitly done for us as part of the push instruction.

Similarly, the ebp register is called the “base pointer”, and it points to the “base” of the current stack frame, which is just where the stack started for the current function. Unlike esp, the value of ebp needs to be set explicitly by our program.

That Wasn’t So Bad, Was It?

This concludes our whirlwind tour of the basics of assembly. Next time, I’ll analyze how a simple C++ program is represented in assembly. In the meantime, try looking at some assembly yourself. The next time you’re debugging in Visual Studio, right click and select “Go To Disassembly”. With a little time and effort, the black magic of assembly code will start to make sense, revealing the secrets of programming to you.

Advertisements

6 thoughts on “Assembly is Fun! (Sort Of)

  1. Finally someone who agrees with me. I’m taking a computer fundamentals class. All of the material is pretty low-level. We start at binary and move from there up to high-level programming. Circuits and logic gates were fun but actually coding in assembly blew my mind. It is awesome to see how really complex instructions can be represented in really simple processes.
    I just wish coding assembly was less tedious. Don’t get me wrong, I enjoy the process but sometimes keeping track of registers and branching functions and deciding whether or not to implement a sub-routine can be a little overwhelming.

    That type of programming feels awesome once you’re finished writing your code. I don’t know if it happens to you too but once I’ve implemented a new algorithm or managed to solve a tricky design puzzle I can’t stop thinking about how stupid I was to not see such a simple solution. Assembling does that to your head.

    Great read by the way. Sorry for the long comment.
    (I love talking about this stuff but don’t really get the chance to do it as often.)

    In case you’re curious my class is using an awful textbook called: Introduction to Computing Systems: from bits and gates to c and beyond, 2/e.

    Have you tried assembly in amd64?

    • Yeah I definitely enjoy learning how to read assembly, but actually writing in it isn’t satisfying to me. I just think it’s cool to be able to understand what the CPU is doing at such a low level. Going lower to the hardware and electrons moving over the wires doesn’t seem useful, I’d consider assembly to be the lowest level that’s useful for programmers to understand.

      I haven’t read much x64 assembly, but I’ve read a few articles about the basics. It’s definitely useful to know, since more and more software is becoming 64-bit. However, I’m more familiar with x86, and I think it’s easier to understand since it has less registers, which is why I covered it in the post.

      Also, if you’re interested in another textbook that covers logic gates and builds up to higher-level programming, I’ve heard great things about the Nand To Tetris book (http://www.nand2tetris.org/course.php). The first 6 chapters are free!

      • Awesome. You just robbed me of sleep tonight.
        I am definitely reading that book.
        I never dreamed of such beautiful writing in a technical book.

        I’m already on the second reading.
        Thanks, man.
        πŸ˜€

        I was thinking of doing a post about my experience with assembly, I think you might like it.

  2. I learned to write in Assembly when I was still in college. It was difficult but yeah, it was awesome. Aside from the fact that you can learn what your CPU is doing for you, it will also give you a very good reason to thank all the higher level programming languages that abstracted this complexity. A simple print instruction on those higher level programming languages is at most one line; while in Assembly, it would take around N number of lines depending on what you’re printing.

    Thanks for the good read and the refresher course! πŸ˜€

    • Exactly – most of the time it’s fine to code in a high-level programming language. However, once in a while you’ll run across a crazy bug or weird language behavior that’s much easier to understand if you have a basic understanding of how your code is represented in assembly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s