# Practical Programmer Math: Bit Arithmetic

## The “Why This Should Matter to You” Story

Imagine that you’re on a phone interview and things are going pretty well. You’ve fielded some gotcha trivia about whether such and such code would compile and what default parameters are and whatnot. After explaining the ins and outs of the strategy pattern and recursive methods, you’re starting to feel pretty good about yourself when, suddenly, the record skips. “Uh, excuse me?” you stammer.

“Why do integers roll over to negative when you loop all the way up to and past their max values?”

You had always just assumed that this was like an old Nintendo game where, naturally, when you completely leave the right side of the screen you simply appear on the left side. “Uh… they just do…? Pass?”

“No problem–let’s move on. Why do unsigned and signed integers occupy the same number of bits even though signed integers have more information?”

“Oh, they have more information? I know that unsigned integers can’t be negative!”

“Yes, well, okay. Let’s try this: how would you multiply a number by two without using the * operator?”

“Well,” you begin, “I’d–”

“And without adding it to itself.”

“Oh. Huh. I dunno.”

The interviewer sighs a little and says, “okay, so what can you tell me about the way numbers are represented as bits and bytes?”

Remembering the last practical math post, you talk about how you know that bits are 1 or 0 and that 8 of these bits makes up a byte. You talk about how bits can be and-ed and or-ed like booleans, and you offer hopefully that you even know an interesting bit of trivia in that 1K is not actually 1,000 bytes. The interviewer nods absently, and looking back later, you have no trouble pinpointing where things went off the rails. In spite of you knowing a bit about how information is represented at the lowest level, you realize you’re lacking some additional basics as to how this bit and bite stuff works.

You wrap up the interview and when you ask what the next steps are, the interviewer says, “don’t call us–we’ll call you.”

## Math Background

First of all, you may want to go back and brush up on some previous posts in this series.

Let’s start with the simplest thing first: bit shifting. To help you understand bit-shifting, I’m going to introduce the concept of “bit significance.” Bit significance is a way of categorizing the “bits” in a binary number from most significant (the “most significant bit” or MSB) to least significant (the “least significant bit” or “LSB”). This is really just a fancy of way of describing the bits that correspond to the largest power of two from greatest to smallest. For example, the number six is written as 0x110, and we would say that the first 1 is the most significant; the middle one the next most; and 0 the least since they correspond to the 4, 2, and 1 values, respectively.

This may seem like a definition so basic as to be pointless, but it’s important to remember that not all architectures necessarily interpret bits from left to right. After all, reading “110” as 6 is rather arbitrary–we could just as easily read it as 3 if we read it the other way. This is reminiscent of how some Middle Eastern written languages are interpreted from right to left. So just to be clear, for the remainder of this post, the MSB will be the leftmost bit and the LSB will be the rightmost.

Bit shifting, often denoted in programming languages by the “>>” and “<<" operators, is the simple procedure of adding a zero to one end or the other of a binary sequence. For example, consider the following code:

1 2 |
int x = 0x0011; //Set x = 3 int y = x << 1; //Left shift by 1 place |

What you would wind up with for y is 0x0110 or 6. Notice how there's now an extra 0 on the right and this results in the value being multiplied by two. If we had bit-shifted by 2 instead of 1, we'd have 0x1100 or 12, which is 3 x 4. If you bit-shift the other way, you get back to where you started:

1 2 |
int x = 0x1100; //Set x = 12 int y = x >> 2; // Right shift by 2 places |

Here, y winds up being 0x0011 or 3 again. Performing a subsequent shift would result in 0x0001 or 1. When you divide unevenly by two, the algorithm 'rounds' down. Another and perhaps easier way to understand this shifting concept is to realize it's actually pretty familiar with the decimal numbers that you're accustomed to. For example, if you "decimal shifted" the number 14 left, you'd have 140 and if you did it to the right, you'd wind up with 1 (rounding down from 1.4). Decimal shifting, as anyone who has "misplaced a zero" knows, is really just a question of multiplying and dividing by powers of 10. Bit-shifting is the same thing, but with powers of two. Now that you understand a bit about bit significance and the concept of shifting, let's tackle the problem of wrap-around and integer overflow. In dealing with binary numbers so far, we've dealt exclusively with positive integers and zero, meaning that we haven't dealt with negative numbers or non-integers like decimals. The latter will be fodder for a future post on floating point arithmetic, but let's consider how to represent a negative number in binary. If it were simply a matter of writing it out, that’d be easy. We could simply write -0x11 for -3, the way we do on paper. But when dealing with actual bits on a computer, we don’t have that luxury. We have to signify negativity somehow using the same ones and zeroes that we use for representing the number. Given that negative/positive is an either/or proposition, the common way to do this is with a bit–specifically, the MSB. This lets you represent negativity, but it necessarily cuts in half the count of numbers that you can represent.

Think of a byte, which is 8 bits. Using all 8 bits, you can represent 2^8 = 256 numbers (0 through 255). But using 7, you can only represent 2^7 = 128 (0 through 127). Think of the language construct “unsigned” in most programming languages (with “signed” being the default). When you’re making this distinction, you’re telling the compiler whether or not you want to sacrifice one of the bits in your number type to use as the +/- sign.

Great, but how do we get to the rollover issue? Well, the reason for that is a little more complicated and it involves a concept called “two’s complement.” Let’s say that we represented binary numbers by taking the MSB and making it 0 for positive and 1 for negative. That’s probably what you were picturing in the last paragraph, and it’s called “sign magnitude” representation. It’s workable, but it has the awkward side effect of giving us two ways to represent 0. For instance, consider a signed 3 bit scheme where the first bit was the sign and the next two the number. 101 and 001 would represent negative and positive 1, respectively, but what about 0x100 and 0x000. Positive and negative zero? It also requires different mechanics for arithmetic operations depending on whether the number is positive or negative, which is computationally inefficient. And finally, the fact that there are two versions of zero means that a potential number that could be represented is wasted.

Instead, processors use a scheme called “two’s complement.” In this scheme, the MSB still does wind up representing the sign, but only by side effect. The positive numbers are represented the way they would be in the binary numbers you’re used to seeing. So, in a 4 bit number, the largest normal number would be 7: 0x0111. But when you flip that last bit and go to what would be 8 as a signed number, 0x1000, you’re suddenly representing -8. If you keep counting, you get pairs of (0x1001 or -7), (0x1010 or -6) … (0x1111 or -1). So you walk the scheme up to the most positive number, which you arrive at halfway through, and then you suddenly hit the most negative number and start working your way back to zero. Adding 1 to 0x0111, which corresponds to Integer.Max for a 4 bit integer, would result in 0x1000, which corresponds to Integer.Min.

Negative and positive that you’re used to operate as a line stretching out either way to infinity with the two concepts colliding at zero. But in a two’s complement scheme, they act as a wheel, and you “roll over.” The reason that two’s complement behaves this way is a bit more involved, and I may cover it in a smaller future post, but suffice it to say that it makes bit arithmetic operations easier to reason about algorithmically and more elegant. It also gets rid of that pesky double zero problem.

## How it Helps You

In the short term all of this helps you answer the phone interview questions, but I don’t think of that as too terribly important since I consider trivia questions during the interview process to be a hiring anti-pattern, often indicative of a department rife with petty oneupsmanship and knowledge hoarding. You’re better off without that job, and, now that you know the answers to the questions, you have the best of all worlds.

On a longer timeline, you’re learning the fundamentals of how data is represented and works in its most core form in computer science, and that opens some nice doors, in general. Superficially, knowing to bit-shift instead of multiply or divide may allow you to optimize in some environments (though I believe that with most modern compilers they can do pretty well optimizing this on their own). But at a deeper level, you’re gaining an understanding of the internal workings of the data types that you’re using.

When data is sent “over the wire” and you’re examining it with some kind of dump tool, it’s coming in streams of bits. Sure there’s an endpoint somewhere re-assembling it into things like unsigned ints and floats and even higher level abstractions. But if those are the lowest levels at which you understand data, it’s going to make programs hard to debug when you’re using a tool that shows you raw data. It may be tedious to interpret things at that level, but in a tight spot it can be a life saver.

## Add a Comment

2 Comments on "Practical Programmer Math: Bit Arithmetic"