DaedTech

Stories about Software

By

P, NP And Decision Problems (Really, It’s Not that Bad)

After a relatively lengthy hiatus from this series, I’m going to bring back the Practical Math series, but with two changes. First, in spite of my rather intense predilection for symmetry and order, I’m going to dispense with the stock set of headers to fill out and get a little more free form with these. And secondly, I’m going to expand it to kind of be “Practical Math for Programmers but also Practical Computer Science and all of the Discrete-Mathy things in between.” With the former concession, my aim is to remove my own blockage from writing these posts — the template is good for consistency, but it sort of sucked the fun out of it and I’d stare at the headers and think, “ugh, okay, that’s a lot to do.” With the latter concession, I mean to acknowledge that computer science and a lot of the algorithms involved are fundamentally “mathy” and that’s okay. We’ll work together to make this thing approchable.

One of the coolest concepts that I ever encountered as a student or in life was NP Complete problems. It’s a rather mind blowing subject and it’s simultaneously philosophically complex and simple to approach.  I’d also like to offer the caveat here that I’m a long, long way removed from formal study of this topic, so what I’m posting here is my recollection of the topic combined with a hodgepodge of visits to various .edu sites to refresh my memory on some of the finer points.  The intention here is to give an approachable introduction to those programmers who’ve never really had formal CS backgrounds and not to defend a PhD thesis.  I’ve made a good faith effort to keep things accurate, but let no one confuse this with a textbook.

Now, lest I scare anyone off too quickly with that disclaimer, let’s do some grade school math. What are the prime factors of 12,317? You’ve got 20 seconds. Ready, set, go!!!

(In my best Alex Trebeck voice) Oh, no, sorry — you’re out of time. Let’s try a different one. How about 109 x 113? Bet you can hammer that one out in 20 seconds either by firing up Calc.exe or just doing it longhand. Well, look at that, 12,317. Whaddya know? And both of those are prime factors, so you now have the answer to the last question too. But… why is it so much harder to reduce a composite number to prime factors than to do multiplication. I mean, 5 x 5 isn’t a whole lot harder than 25/5. Why is factoring so much harder than multiplying? Kind of interesting to think about, huh? And just like that, you’ve dipped your toe into the waters of pretty heavy discrete math and computer science theory.

See, it’s not that bad. Let’s consider some concepts now before we give them names. I think the names get pretty weird pretty fast if you don’t first understand the concepts. Thinking about the idea of factoring versus multiplication and contemplating the nature of the uneven difficulty level, you might think as follows. Multiplication is a problem that you can solve pretty easily. Prime factoring is a problem that you can solve, but not as easily. However, given a number and a candidate prime factor, what you can do is check really easily. In other words, when you were trying to find the prime factors of 12,317, you were doing something like, “let’s see, is it divisible by 2… no… 3… nope… 5… etc.” But if someone came along and said, “is 109 a factor of 12,317,” you would say, “wow, it sure is, and now I’ve also found the only other prime factor to boot. Thanks man!” So we can differentiate between whether a problem can be solved easily (e.g. multiplication but not factoring) and whether a prospective solution for a problem can be checked easily (e.g. both factoring and multiplication). And we can also phrase our approaches to these problems in yes or no forms, requiring a decision to be rendered: “Is 109 a factor of 12,317 — yes or no?” or perhaps, “do any smaller prime factors of 12,317 exist?”

With me so far? Good. Let’s not confuse things any more quite yet, opting instead to give voice to the reasonable question, “alright, whatever, but who cares?”  Well, for starters, the NSA and cares and so should you’re concerned about the degree to which they trample on your various rights.  The reason for this is that the imbalance between time required to solve multiplication and factoring plays a crucial role in a number of public key encryption schemes.  If you want to go beyond the realm of encryption and civil liberties, consider the implications of the discovery of a fast, polynomial time solution to NP Complete problems: you’d basically solve all of mankind’s problems.  Aside from personal glory and every conceivable medal/prize under the sun, you’d be able to answer the question, “how do I build the fastest possible fighter jet” in relatively short order, especially when compared to the way we’d currently need to approach the problem, which would be to build every possible fighter jet and see which one is best (perhaps applying some limited heuristics).

If this sounds comically grandiose, it sort of is, but also, it’s sort of not.  It’s unknown whether this unicorn solution (fast polynomial time algorithm for NP Complete problems) is even theoretically possible, but if it is and then it were solved, it would probably be the most ground breaking mathematical discovery in the history of mathematics.  But if you’re the sort who likes practical answers to “why should I care?” then consider the ability to recognize problems NP problems as a sort of “road closed” sign when looking for algorithmic efficiency.  If you realize that a problem is NP Complete, you can give up on finding a slick way to do it and realize that it’s either time to go with brute force or time to seriously reconsider what you’re doing.

So what is this P and NP and NP Complete and all that?  Well before we go back to some of the discoverable concepts from the factoring example and before things get all formal, let’s consider an example of an NP Complete problem.  Things are always easier by example than by mathematical definition, so let’s talk about a salesman and his travels.

Let’s say that there are a bunch of cities that are on a salesman’s route, and that the salesman has to visit each city exactly once.  Obviously, there is a mileage distance associated with each trip between cities, and let’s say that our salesman can only fly on one particular airline and that it’s not a given that there are flights between any two cities.  So, to review, one airline, all cities visited exactly once (which theoretically may or may not actually be possible, depending on the cities involved), mileage tabulated along the way.  The question that is then NP Complete is, “is there an itinerary for the salesman that results in him traveling less than X miles?”

I like this because it’s pretty simple to understand what’s being asked, as far as examples of heady discrete math concepts go.  And, it’s also possible to understand where the subtle complexity creeps in.  Can you think of a way to solve this other than brute force through all possible routes and then checking to see if a given one is less than X?  (If you can, email me your solution and tell no one about this!  I’ll see that it winds up in good hands.)  Can you also see that, while answering the question is a matter of brute force, checking the answer is a simple, polynomial time proposition (walk the given path adding up the miles and then see if the total is less than X)?

Okay, so let’s finally get to some definitions, but I’m going to keep them colloquial rather than mathematically rigorous, given the theme of this series.  First off, as touched on earlier, there is the concept of a decision problem.  A decision problem is simply a problem in which an algorithm is applied to some set of inputs and will produce an answer of “yes” or no.”  Examples include “do any smaller prime factors of 12,317 exist?” and “is there any sales route to be traveled where the salesman can fly less than 120,000 miles?”

Now to spice things up a bit.  P problems are a class of decision problems where the decision can be rendered in a time that is polynomial in relation to the input.  This means that in O notation terms, you’d be talking about things like O(n), O(n^3), O(n^20), etc.  The time might be awful, but it wouldn’t be exponential like O(2^n) or combinatorial like O(n!).  NP problems are a class of decision problems where a proposed solution can be verified in time that is polynomial in relation to the input.  Conceptually, P problems are a subset of NP problems, which makes sense.  After all, if you can answer whether or not there’s any path that the salesman can travel in short order, you can certainly check one particular path in pretty short order too (and, conceptually, if you can enumerate all of them in polynomial time, you could simply solve the verification problem by just paying attention when you happened to get to the proposed solution).

Now, at this point you might wonder whether P is more than a subset of NP, but actually the same as NP.  This puts you in good company with the entirety of those who study theoretical computer science.  This is one of the most important problems in the field in need of a solution.  Okay, so what’s NP Complete, exactly, since we’ve been talking about it?  To answer that, I’m going to reach back to an old post where I talked about this concept obliquely.

I don’t have a fast (polynomial time) solution for the traveling salesman problem, but let’s say that I did.  Let’s say that I had some magical oracle that solved this in short order.  Since the traveling salesman problem is NP Complete and I have an oracle that can solve it quickly, it means that I can use my oracle to solve any other NP problem quickly (polynomial) as well.  To be a little more rigorous, I could, in polynomial time, transform my oracle into a solution for any other NP problem, in polynomial time.  So NP Complete problems are those where if you find such an oracle, you win life.  If a problem is not in NP complete, your oracle is a bronze medal.

There is a lot more that could be covered, but I think this is probably a good stopping point on the subject.  For more reading, here are some interesting links:

By

Practical Math for Programmers: O Notation

It’s been a while since I’ve done one of these, and I didn’t want to let the series die, so here goes.

The “Why This Should Matter to You” Story

You’ve probably heard people you work with say thing like, “this is going to run in O of N squared, but mine is going to run in O of N, so we should use mine.” You understand that they’re talking about performance and, more specifically time, and that whatever N is, squaring it means more, and more is bad when it comes to time. From a realpolitik perspective, you’re aware that this O of N thing is something that lends programmers some cred if they know how to talk about it and from a career perspective, you understand that the Googles and Microsofts of the world seem interested in it.

But when your coworker talks about using his “O of N” solution, you don’t actually know what that means. And because of this, you tend to lose arguments. But beyond that, you’re also losing out on the ability to understand and reason about your code in a way that will let you see more pieces of the puzzle when weighing design tradeoffs. You could have a more intuitive understanding of how your code will perform in real world circumstances than you currently do.

Math Background

public void LinearTime(int n)
{
    for (int index = 0; index < n; index++)
        DoSomethingThatTakesOneSecond();
}

If you look at this method, do you know how long it takes to run? My guess is that you'd correctly and astutely say, "it depends on what you pass in for n." In this case, assuming that the method called inside the loop does what's advertised, the method will take n seconds: pass in 10 and it takes 10 seconds, but pass in 100 and it takes 100 seconds. The runtime varies in linearly with n. This said to run in O(N).

Let's look at another example.

public void QuadraticTime(int n)
{
    for (int index = 0; index < n; index++)
        for (int innerIndex = 0; innerIndex < n; innerIndex++)
            DoSomethingThatTakesOneSecond();
}

This one may be a little less intuitive, but you'll definitely notice it's going to be slower. How much? Well, it varies. If you pass in 2 for n, the outer loop will execute twice, and each time it does, the inner loop will execute twice, for a total of 2*2 = 4 executions. If you pass in 10, the outer loop will execute 10 times, with the inner loop executing ten times for each of those, so the total becomes 10*10 = 100 executions of the method. For each n, the value is n^2, so this is said to be O(n^2) runtime.

Let's take a look at yet another example.

public void ConstantTime(int n)
{
    DoSomethingThatTakesOneSecond();
    DoSomethingThatTakesOneSecond();
}

In this (poorly written because of the unused parameter) method, it doesn't matter what you pass in for n. The algorithm will always execute in two seconds. So, this means that it's O(2), right? Well, no, actually. As it turns out, a constant time algorithm is denoted in O notation by O(1). The reason for this is mainly convention, but it underscores an important point. A "constant time" algorithm may not be constant, per se, and it may not be one or any other specific number, but a "constant time" operation is one that is executed in an amount of time that is bounded by something independent of the problem at hand.

To understand what I mean here, consider the first two examples. Each of those runtimes depended on the input "n". The third example's runtime depends on something (the method called, for instance), but not n. Simplifying the runtime isn't unique to constant time. Consider the following:

public void QuadraticTime(int n)
{
    for (int index = 0; index < n; index++)
        for (int innerIndex = 0; innerIndex < index; innerIndex++)
            DoSomethingThatTakesOneSecond();
}

This code is identical to the second example, except that instead of innerIndex varying up to n, it varies only up to index. This is going to run in half the time that example two ran in, but we don't say that it's O(n^2/2). We simply say this is also O(n^2). Generally speaking, we're only interested in the factor of n and not any coefficients of n. So we don't talk about O(2n), O(n/12) or anything else, just n. Same goes for when n is squared or cubed, and the same thing goes for constant time -- there's no O(1/2) or O(26), just O(1).

One last thing to note about O notation (or Big O Notation, more formally) is that it constitutes a complexity upper bound. So, for example, if I stated that all of the examples were O(n^3), I'd technically be accurate, since all of them will run in O(n^3) or better. However, if you cite a higher order of complexity when discussing with others and then pedantically point out that you're technically right, they'll most likely not grasp what you're talking about (and, if they do, they'll probably want to stop talking to you quickly).

How It Helps You

In the same way that knowing about design patterns in software help you quickly understand a complex technique, O notation helps you understand and discuss how complex and resource intensive a solution is a function of its inputs. You can use this form of notation to talk about both algorithm runtime as well as memory consumed, and it gives you concrete methods of comparison, even in cases where you don't know the size of the input.

For instance, if you're doing some test runs on small set of data before pushing live to a web server and you have no way of simulating the high volume of users that it will have in the wild, this comes in very handy. Simple time trials of what you're doing aren't going to cut it because they're going to be on way too small a scale. You're going to have to reason about how your algorithm will behave, and O notation helps you do just that. On a small sample size, O(n^3) may be fine, but in production, it may grind your site to a halt. Better to know that going in.

O notation can also help you avoid writing terrible algorithms or solutions that perform wildly worse than others. For instance, consider the case of Fibonacci numbers. These are a sequence of numbers where Nth number is the last two numbers in the sequence added together: Fn = Fn-1 + Fn-2. The sequence itself is thus: 1, 1, 2, 3, 5, 8, 13, 21, etc.

Here is an elegant-seeming and crystal clear implementation of Fibonacci:

public int CalculateNthFibonacciNumber(int n)
{
    if (n == 0)
        return 0;
    if (n == 1)
        return 1;
    return CalculateNthFibonacciNumber(n - 1) + CalculateNthFibonacciNumber(n - 2);

}

It certainly demonstrates recursion and it's very understandable. But it also turns out to run in O(1.6^n), which is exponential runtime. We haven't covered that one yet, but exponential runtime is catastrophic in your algorithms. Ask this thing for the 1000th Fibonacci number and come back at the end of time when it's done completing your request. As it turns out, there is an iterative solution that runs in linear, O(n) time. You probably want that one.

This is the value of understanding O notation. Sure, without it you get that not all implementations are equally well performing and you may understand what makes some better than others. But understanding O notation gives you an easy way to remember and keep track of different approaches and how efficient they are and to communicate this information with others.

Further Reading

  1. Wikipedia
  2. A nice article for beginners
  3. Thorough explanation from stack overflow 'wiki'
  4. Academic but approachable explanation from MIT

By

Practical Programmer Math: Bit Arithmetic

The “Why This Should Matter to You” Story

Imagine that you’re on a phone interview and things are going pretty well. You’ve fielded some gotcha trivia about whether such and such code would compile and what default parameters are and whatnot. After explaining the ins and outs of the strategy pattern and recursive methods, you’re starting to feel pretty good about yourself when, suddenly, the record skips. “Uh, excuse me?” you stammer.

“Why do integers roll over to negative when you loop all the way up to and past their max values?”

You had always just assumed that this was like an old Nintendo game where, naturally, when you completely leave the right side of the screen you simply appear on the left side. “Uh… they just do…? Pass?”

“No problem–let’s move on. Why do unsigned and signed integers occupy the same number of bits even though signed integers have more information?”

“Oh, they have more information? I know that unsigned integers can’t be negative!”

“Yes, well, okay. Let’s try this: how would you multiply a number by two without using the * operator?”

“Well,” you begin, “I’d–”

“And without adding it to itself.”

“Oh. Huh. I dunno.”

The interviewer sighs a little and says, “okay, so what can you tell me about the way numbers are represented as bits and bytes?”

Remembering the last practical math post, you talk about how you know that bits are 1 or 0 and that 8 of these bits makes up a byte. You talk about how bits can be and-ed and or-ed like booleans, and you offer hopefully that you even know an interesting bit of trivia in that 1K is not actually 1,000 bytes. The interviewer nods absently, and looking back later, you have no trouble pinpointing where things went off the rails. In spite of you knowing a bit about how information is represented at the lowest level, you realize you’re lacking some additional basics as to how this bit and bite stuff works.

You wrap up the interview and when you ask what the next steps are, the interviewer says, “don’t call us–we’ll call you.”

Math Background

First of all, you may want to go back and brush up on some previous posts in this series.

Let’s start with the simplest thing first: bit shifting. To help you understand bit-shifting, I’m going to introduce the concept of “bit significance.” Bit significance is a way of categorizing the “bits” in a binary number from most significant (the “most significant bit” or MSB) to least significant (the “least significant bit” or “LSB”). This is really just a fancy of way of describing the bits that correspond to the largest power of two from greatest to smallest. For example, the number six is written as 0x110, and we would say that the first 1 is the most significant; the middle one the next most; and 0 the least since they correspond to the 4, 2, and 1 values, respectively.

This may seem like a definition so basic as to be pointless, but it’s important to remember that not all architectures necessarily interpret bits from left to right. After all, reading “110” as 6 is rather arbitrary–we could just as easily read it as 3 if we read it the other way. This is reminiscent of how some Middle Eastern written languages are interpreted from right to left. So just to be clear, for the remainder of this post, the MSB will be the leftmost bit and the LSB will be the rightmost.

Bit shifting, often denoted in programming languages by the “>>” and “<<” operators, is the simple procedure of adding a zero to one end or the other of a binary sequence. For example, consider the following code:

int x = 0x0011;  //Set x = 3
int y = x << 1; //Left shift by 1 place

What you would wind up with for y is 0x0110 or 6. Notice how there’s now an extra 0 on the right and this results in the value being multiplied by two. If we had bit-shifted by 2 instead of 1, we’d have 0x1100 or 12, which is 3 x 4. If you bit-shift the other way, you get back to where you started:

int x = 0x1100;  //Set x = 12
int y = x >> 2; // Right shift by 2 places

Here, y winds up being 0x0011 or 3 again. Performing a subsequent shift would result in 0x0001 or 1. When you divide unevenly by two, the algorithm ’rounds’ down. Another and perhaps easier way to understand this shifting concept is to realize it’s actually pretty familiar with the decimal numbers that you’re accustomed to. For example, if you “decimal shifted” the number 14 left, you’d have 140 and if you did it to the right, you’d wind up with 1 (rounding down from 1.4). Decimal shifting, as anyone who has “misplaced a zero” knows, is really just a question of multiplying and dividing by powers of 10. Bit-shifting is the same thing, but with powers of two.

Now that you understand a bit about bit significance and the concept of shifting, let’s tackle the problem of wrap-around and integer overflow. In dealing with binary numbers so far, we’ve dealt exclusively with positive integers and zero, meaning that we haven’t dealt with negative numbers or non-integers like decimals. The latter will be fodder for a future post on floating point arithmetic, but let’s consider how to represent a negative number in binary.

PhoneInterviewIf it were simply a matter of writing it out, that’d be easy. We could simply write -0x11 for -3, the way we do on paper. But when dealing with actual bits on a computer, we don’t have that luxury. We have to signify negativity somehow using the same ones and zeroes that we use for representing the number. Given that negative/positive is an either/or proposition, the common way to do this is with a bit–specifically, the MSB. This lets you represent negativity, but it necessarily cuts in half the count of numbers that you can represent.

Think of a byte, which is 8 bits. Using all 8 bits, you can represent 2^8 = 256 numbers (0 through 255). But using 7, you can only represent 2^7 = 128 (0 through 127). Think of the language construct “unsigned” in most programming languages (with “signed” being the default). When you’re making this distinction, you’re telling the compiler whether or not you want to sacrifice one of the bits in your number type to use as the +/- sign.

Great, but how do we get to the rollover issue? Well, the reason for that is a little more complicated and it involves a concept called “two’s complement.” Let’s say that we represented binary numbers by taking the MSB and making it 0 for positive and 1 for negative. That’s probably what you were picturing in the last paragraph, and it’s called “sign magnitude” representation. It’s workable, but it has the awkward side effect of giving us two ways to represent 0. For instance, consider a signed 3 bit scheme where the first bit was the sign and the next two the number. 101 and 001 would represent negative and positive 1, respectively, but what about 0x100 and 0x000. Positive and negative zero? It also requires different mechanics for arithmetic operations depending on whether the number is positive or negative, which is computationally inefficient. And finally, the fact that there are two versions of zero means that a potential number that could be represented is wasted.

Instead, processors use a scheme called “two’s complement.” In this scheme, the MSB still does wind up representing the sign, but only by side effect. The positive numbers are represented the way they would be in the binary numbers you’re used to seeing. So, in a 4 bit number, the largest normal number would be 7: 0x0111. But when you flip that last bit and go to what would be 8 as a signed number, 0x1000, you’re suddenly representing -8. If you keep counting, you get pairs of (0x1001 or -7), (0x1010 or -6) … (0x1111 or -1). So you walk the scheme up to the most positive number, which you arrive at halfway through, and then you suddenly hit the most negative number and start working your way back to zero. Adding 1 to 0x0111, which corresponds to Integer.Max for a 4 bit integer, would result in 0x1000, which corresponds to Integer.Min.

Negative and positive that you’re used to operate as a line stretching out either way to infinity with the two concepts colliding at zero. But in a two’s complement scheme, they act as a wheel, and you “roll over.” The reason that two’s complement behaves this way is a bit more involved, and I may cover it in a smaller future post, but suffice it to say that it makes bit arithmetic operations easier to reason about algorithmically and more elegant. It also gets rid of that pesky double zero problem.

How it Helps You

In the short term all of this helps you answer the phone interview questions, but I don’t think of that as too terribly important since I consider trivia questions during the interview process to be a hiring anti-pattern, often indicative of a department rife with petty oneupsmanship and knowledge hoarding. You’re better off without that job, and, now that you know the answers to the questions, you have the best of all worlds.

On a longer timeline, you’re learning the fundamentals of how data is represented and works in its most core form in computer science, and that opens some nice doors, in general. Superficially, knowing to bit-shift instead of multiply or divide may allow you to optimize in some environments (though I believe that with most modern compilers they can do pretty well optimizing this on their own). But at a deeper level, you’re gaining an understanding of the internal workings of the data types that you’re using.

When data is sent “over the wire” and you’re examining it with some kind of dump tool, it’s coming in streams of bits. Sure there’s an endpoint somewhere re-assembling it into things like unsigned ints and floats and even higher level abstractions. But if those are the lowest levels at which you understand data, it’s going to make programs hard to debug when you’re using a tool that shows you raw data. It may be tedious to interpret things at that level, but in a tight spot it can be a life saver.

Further Reading

  1. A pretty nice guide to bit shifting
  2. Detailed explanation of Two’s Complement
  3. A nice video on Two’s Complement
  4. A comparison of different schemes for representing negative in bits
By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Practical Programmer Math: From Booleans to Bits to Bytes

The “Why This Should Matter to You” Story

A few scenarios today:

  1. You don’t understand why the answer to “how many bytes are in a kilobyte” isn’t 1000 (at least, not exactly).
  2. You see if((x & mask) > 7) and wonder why it compiles despite the author forgetting an ‘&’
  3. You feel a little ashamed because in spite of the fact that your non-programmer friends say that you “think in ones and zeroes”, you don’t actually understand how your code becomes ones and zeroes.

Everything here is symptomatic of a lack of understanding of the way data is represented at the most fine grained level in computer science and in programming.
It’s possible to understand some programming concepts without understanding the mechanics of how they work internally. After all, you can understand that a list.sort() method without understanding the sort algorithm being used.

So the general story is that your understanding of the data and data structures that go into programming breaks down at some level. You may understand that a string is a series of characters, but perhaps not that characters are actually integers or that integers are actually bytes or that bytes are actually series of bits or that bits are simply Booleans. And that breakdown in understanding somewhere in the chain leads to situations where you don’t fully understand what your peers are talking about.

Math Background

In the last two posts in this series, I’ve gone through some basic but important points about Boolean algebra, which one might describe as the basic mathematics of truth values. In this system there are two fundamental values to which constants, variables and expressions might evaluate: true and false. Boolean algebra is, in a very real sense, the absolute fundamental building block for everything that we as programmers do, but it may not be entirely obvious as to how.

To get there, let’s leave behind the syntax of Boolean algebra and pick up the syntax of machine language. So forget about “true” and “false” and consider instead 1 and 0, respectively. 1 is true, 0 is false. And further let’s get rid of the conjunction, disjunction and negation operators and consider, in their stead, the more familiar & | ~ operators. (You’ll notice that these are single operators and not the double operators with which you are likely familiar and this is not a typo — there is a method to my madness.) All of the rules from the previous posts still apply so, for instance, the identity law of A | 0 = A applies.

So far we’ve just swapped syntactical operators and constants. But let’s now consider an interesting property of basic arithmetic (not Boolean arithmetic) which is that written numbers are really a sequence of answers to a pre-arranged sequence of questions.

Wat

Okay, let me clarify with an example. Let’s pick a three digit number like 481. There’s nothing particularly special about this number, but let’s think of it in terms of a Q&A session. It goes like this “how many hundreds are in this number?” followed by the answer “four.” “How many tens are in this number?” “Eight.” “And how many ones?” “One.” This is really all there is to numerical representation of natural numbers (discounting the concept of zero, negative numbers, decimals, etc).

In all numbers that we’re comfortable with in normal society, the question always involves powers of 10: 1, 10, 100, etc. And so the answer is always a number 0 through 9. But what if we asked the question in a different way? What if we did it in terms of 7? Or 34? Or… 2? The answer, respectively, would always be an answer 0 through 6, 33… and 1. That last one is most interesting to us here. The answer will always be between 0 and 1 or, put another way, the answer will always be 0 or 1. Interesting.

Let’s look at an example. The number 10110, using powers of 2, is asking the question, “how many 16’s, 8’s, 4’s, 2’s and 1’s are there” to which the answers come as “1, 0, 1, 1, 0” respectively. We have 1 sixteen, no eights, 1 four, 1 two and no ones for a total of 22. 10110 is the binary representation of 22. That’s probably not new to you regardless of your CS/math/etc background if you’re a programmer. But what might be new to you is just how fundamental this concept is because it allows translation between physical and “soft”-ware concepts. How so? Because you know what’s actually pretty hard to represent and measure physically? 0 through 9. But you know what isn’t? Booleans. On or off. There or not. Current flowing or nothing. Chad punched or not. Magnetic surface raised or flat. Ion charged or neutral. You get the idea.

We can build computers where information is stored as a series of Booleans representing not “true or false”, but “yes or no” and we can still apply all of the same principles, rules, equivalences and concepts, albeit with slightly different semantics. We can take these “yes or no” values and assemble them into integers using our binary question schemes. We can them assemble those integers into ASCII characters, strings, basic programs, complex programs, multi-media files, and you know the rest.

The math behind all of programming is simple. The basis for the whole thing is simply gigantic sequences of Boolean values assembled into ever-larger pieces. The marriage of Boolean and counting unique to binary encoding means that you know how to assemble the Boolean sequences into meaningful information and also how to manipulate information and store it back, performing operations using identity, domination, negation, and other equivalences (we will get to some interesting optimizations using bit-shifting and other techniques in future posts).

The only missing piece here is how we get from these individual “1 or 0” values, called “bits” to larger values. Well, most of this is arbitrary. For instance, the definition of “byte” is simply “8 consecutive bits,” meaning that a byte can represent one of 256 possible integer values. These different values can also be assigned (again arbitrarily) to non-integral values such as characters. This is the basis for ASCII, a near-universal standard for representing text using integers (and thus binary/bits). This is the mechanism by which the building blocks are assembled.

How it Helps You

In a (hyphenated) word: background-knowledge. For instance, with all of this in mind, it’s relatively easy to get to solutions for the mysteries at the beginning of the post. First up is the weirdness surrounding bit counting using KB, MB, GB, etc. Is a kiloybyte a thousand bytes the way a kilogram is a thousand grams? Well, it could be, if you define it the way we’re used to seeing those terms defined. But that isn’t actually what winds up happening. What happens instead is the result of an interesting quirk of powers of two. As it turns out, 2 to the 10th is 1,024, which is roughly a ‘kilo’ of bytes. 2 to the 20th is 1,048,576, which is roughly a ‘mega’ (1 million) of bytes. Same kind of “fuzzy” counting happens with Giga, Tera, etc, all the way up the chain with 2 to the 30th, 40th, etc. So why reuse a prefix that means something else — something otherwise very precise? Personally, I think the adopters of this convention were being cute, and I don’t care for it at all (*dons flame retardant suit), but them’s the breaks. Apparently, I’m not the first to feel this way and others have tried and are trying to do something about it.

What about all that bit mask stuff? Well that’s now pretty simple to sort out as well. One common convention that you need for a bit of additional perspective is to know that “base-16” is often adopted for representing bytes. The reason is that a byte is 8 bits and each 4 bits represents one of 16 possible values. So if you have a base-16 number, bytes can be written with two values. These take the form of something like 4E because you get 0-9 of base-10 that you’re used to and then another 6 values representing 10 through 15 which are written as the characters A through F. It’s much easier to remember bytes as two values than it is to remember 16 bits. I mention all of this because “FF” is identical in value to “11111111” and “F0” is identical to “11110000”.

You don’t see bit masking as much in managed languages, though you might if you get your hands dirty down in the transport layer doing things like socket management. But those coming from a C background are quite familiar with it. If you get a message over the wire, it may come in as a stream of bytes. But when things like network latency and accuracy are issues (or perhaps in embedded environments when space is at a premium) there is a tendency to squeeze every last modicum of value out of the space available. In a web application, representing a number 1 through 10 with data type “int” (which in managed languages will be a 32-bit/4-byte integer) is not really an issue, but in the environments I’ve mentioned, it’s digital blasphemy to waste that space. You only need 4 bits to represent that — the other 28 are wasted.

But the smallest unit typically dealt with in programming convention is the byte. So what to do? Well, enter bit masking. Let’s say I have two speakers on some embedded device that can be set to volume level 1-10 and I want to write a protocol for querying volume level for both. What I would do is define a message that was 1 byte long and have the first 4 bits of the byte represent the volume of one speaker and the second 4 represent the volume of the other. If I wanted just the value of speaker 2, I would want a way to preserve the first 4 bits of the byte I got while clobbering the second 4. Well, enter the bit mask. If I perform the operation int myValue = 0x0F & incomingSpeakerByte, I get what I’m looking for. Why? Well, let’s consider this.

The value 0x0F can be rewritten as 0x00001111 and the value of the speakers is whatever is coming back. Let’s say that it’s set to volume level 9, which can be rewritten as 0x1001. But, let’s say that the other speaker is set to volume level 4, which is 0x0100. That means that both speakers together are represented with the byte 0x01001001. If I set a local variable (where space is not a premium) to the byte I’m getting over the wire, it will be equal to 73, and that is nonsense in our 1-10 volume domain. But the bit mask operation performs the operation 0x00001111 & 0x01001001 to get the final value of 9 (0x00001001) correctly. The reason this works is that the “&” operation aligns the two sequences of bits and performs AND on them one by one, recording the value. This is called “bitwise and.” If you think back to the domination and identity laws from the last post, we’re using domination (x & 0 = 0) to clobber the first 4 bits and identity (x & 1 = x) to preserve the second 4, which are the only four that interest us. (Storing the other byte in a local int is a bit more involved, taking advantage of the aforementioned “bit-shifting,” and beyond the scope of this post). The long and short of it is that when you see bit masking operations performed, you now know what’s going on — the developer is using the same byte to store multiple values and using this technique to extract the ones he or she needs at the moment.

As to thinking in ones and zeroes, you might not be quite there yet, but this is a step in the right direction. It helps you to understand the fundamental building blocks of software based data and how we get from the physical world to the “soft” representation that we deal with. It also hopefully makes some of the weird-symbol bit manipulation tricks seem a little less like black magic. At the very least, I hope that instead of viewing some of this stuff with suspicion or hopelessness, you now have some basic concepts, terms to google and questions to ask.

Further Reading

  1. Wikipedia on bitwise operations
  2. HowStuffWorks on bits and bytes
  3. How many x in a y calculator
  4. Wiki on basic bit manipulation

By

Practical Math for Programmers: DeMorgan’s and Other Logical Equivalences

The “Why This Should Matter to You” Story

TL;DR: You read my post about conditional bloopers and thought, “what’s wrong with that?!?”

Let’s say that you recall the lesson from last time and you make sure that you also avoid statements like “if(A && (A || B))”. You draw out truth tables whenever you feel you have time (and when no one is looking because, if you’re honest with yourself, you feel a little awkward doing it when people are watching). And yet things like this still happen to code you’ve written:

public void DoNightThings()
{
if (!(!_itsLate || !_iAmTired))
GoToBed();
}

gets refactored to:

public void DoNightThings()
{
if (_itsLate && _iAmTired)
GoToBed();
}

When you notice this in the source history, you check with your truth tables, and sure enough, it’s right. You kind of get it, in the end, but you always feel like you’re a step behind.

Math Background

In Boolean algebra, truth tables are sort of the equivalent of rote arithmetic memorization such as learning multiplication tables and using them to perform long-hand multiplication of larger numbers. It’s a rote, brainless, and time-consuming algorithm that you follow when there isn’t something better to use to get you to the answer more quickly. But the good news is that if you memorize a different set of information — some axiomatic laws of equivalence — you can get a lot of answers more quickly and easily. Here is the series of equivalences paired with explanations that will really help you understand the finer points of simplifying your conditional logic.

Identity Laws:  (A ^ TRUE) = A and also (A v FALSE) = A.  These identity laws simply point out the identity of variables compared with special constants.  This is like talking about additive identity (x + 0 = x) or multiplicative identity (x*1 = x) in elementary algebra (i.e. what you learn in junior high).

Domination Laws:  (A ^ FALSE) = FALSE and also (A v TRUE) = TRUE.  These are the opposites of the identity laws and have elementary algebra annihilator equivalent (x*0=0) or (x/x = 1).  These operate simply in that any false in a conjunction or any true in a disjunction renders the entire expression false or true, respectively.

Double Negative:  ~~A = A.  (Probably more formally known as “double negation” or negation of negation, but I think this easier to remember).  This quite simply says that if something is not not there, then it is there or, in Boolean logic, two falses make a true.

Commutativity:  (A v B) = (B v A) and (A ^ B) = (B ^ A).  This simple but important law    shows that the order of arguments to the conjunction and disjunction operators do not matter.  Think back to your arithmetic days when you learned that 4+3 = 3+4.  Same principle.

Associativity:  (A v B) v C = A v (B v C) and (A ^ B) ^ C = A ^ (B ^ C).  You can think of this as demonstrating that the location of parantheses is irrelevant when all operations are conjunction or disjunction operations.  If you reach ever further back in your brain from arithmetic, you may remember this guy as 3 + (4 + 5) = (3 + 4) + 5.

Tautology:  (A v ~A) = True and (A ^ ~A) = False.  We went over this in more detail last time, but suffice it to say that conjunction and disjunction with a variable’s negation leads to vacuously true or false results.

Idempotence:  (A v A) = A and also (A ^ A) = A.  Idempotence in general is the idea that an operation can be applied multiple times but only have a single effect.  For a rather practical example, consider turning a light switch on; the first time you push up a light turns on, but subsequent pushes up don’t turn the light somehow “more on”.  Idempotence in Boolean expressions means that redundant conjunctions and disjunctions simply have no effect.

Distribution:  A ^ (B v C) = (A ^ B) v (A ^ C) and A v (B ^ C) = (A v B) ^ (A v C).  This is a powerful equivalence because it provides a means to “move” parentheses around in Boolean expressions and retain their exact truth values.  Consider this in English and there is sense to it: “If A is true and also either B or C is true then A or C is true and A or B is true.”  Should that be a little abstract for your taste, think of it conversationally “It’s raining and either I’m inside or I’m wet” is the same as “It’s raining and I’m wet or it’s raining and I’m inside.”

De Morgan’s Laws:  ~(A ^ B) = (~A v ~B) and ~(A v B) = (~A ^ ~B).  This is another heavy hitter like distribution except that this one allows us to “move” negations when simplifying Boolean expressions.  This one also makes sense conversationally: “If it’s not true that person X is male and person x is female, then either person X is not male or person X is not female.”  De Morgan’s Laws also have an interesting incarnation in higher order logics like first order logic, but that’s probably going to be beyond the scope of this series for quite some time.

Absorption: A v (A ^ B) = A and A ^ (A v B) = A.  This was the main example we covered last time, so you can work out the truth table if you like.  As it turns out, this is a formally defined equivalence as well.

How It Helps You

In the last post in this series, I covered truth tables, which are sort of like the Boolean algebra equivalent of long division.  They’re a mechanical, inelegant process by which you can brute force a correct answer.  Logical equivalences are algebraic axioms that provide you with the building blocks for deductive reasoning — the building blocks for really thinking things through and critically solving problems.

You don’t necessarily need to memorize these, but I would study them enough at least to have something tickle the back of your mind the next time that you’re staring at an incredibly complex looking conditional.  That something will be “I think we can do better than this” and then, even if you don’t have the equivalences memorized, at least you’ll know enough to go look them up.  Armed only with truth tables, this can feel like flailing around in the dark while looking for a pattern.  But armed with logical equivalences you’ll say “oh, that’s right — I can eliminate the not here and I can turn these ors into ands over here, and… yeah, much better.”

Memorizing these equivalence laws would be even better, but not for the sake of pedantry or winning obscure trivia games.  I say that memorizing them is better because if you memorize them, you will practice them and conditionals, particularly complex ones, will start to work themselves into simpler forms automatically in your mind.  You will see things like Neo in the Matrix when he’s about to defeat Agent Smith, if the movie were a lot more boring.  Point is, though, you’ll develop an extremely good sense for when conditional logic could be simplified, corrected, or clarified and that has powerful ramifications for the readability, correctness and maintainability of the code you work with.

To circle back to the small example of a real world situation, understanding these Boolean laws of inference is likely to put you ahead of the curve, rather than behind it, when it comes to understanding conditional simplification.  Rather than the person whose code is always getting refactored, you’ll be the one doing the refactoring and doing it confidently.

Further Reading

  1. Wikipedia
  2. A bit more circuit flavored, but with some applied examples at the bottom.
  3. An introduction to the idea of proofs in Boolean algebra
  4. Very theoretical, with formal proof for most of the inference laws.
By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.