TDD: Simplest is not Stupidest

Where the Message Gets Lost In Teaching TDD

I recently answered a programmers’ stackexchange post about test-driven development. (As an aside, it will be cool when me linking to a SE question drives more traffic their way than them linking to me drives my way :) ). As I’m wont to do, I said a lot in the answer there, but I’d like to expand a facet of my answer into a blog post that hopefully clarifies an aspect of Test-Driven Development (TDD) for people–at least, for people who see the practice the way that I do.

One of the mainstays of showcasing test-driven development is to show some extremely bone-headed ways to get tests to pass. I do this myself when I’m showing people how to follow TDD, and the reason is to drive home the point “do the simplest thing.” For instance, I was recently putting together such a demo and started out with the following code:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void IsEven_Returns_False_For_1()
{
    var inspector = new NumberInspector();
 
    Assert.IsFalse(inspector.IsEven(1));
}

public class NumberInspector
{
    public bool IsEven(int target)
    {
        return false;
    }
}

This is how the code looked after going from the “red” to the “green” portion of the cycle. When I used CodeRush to define the IsEven method, it defaulted to throwing NotImplementedException, which constituted a failure. To make it pass, I just changed that to “return false.”

The reason that this is such a common way to explain TDD is that the practice is generally being introduced to people who are used to approaching problems monolithically, as described in this post I wrote a while back. For people used to solving problems this way, the question isn’t, “how do I get the right value for one,” but rather, “how do I solve it for all integers and how do I ensure that it runs in constant time and is the modulo operator as efficient as bit shifting and what do I do if the user wants to do it for decimal types should I truncate or round or throw an exception and whoah, I’m freaking out man!” There’s a tendency, often fired in the forge of undergrad CS programs, to believe that the entire algorithm has to be conceived, envisioned, and drawn up in its entirety before the first character of code is written.

So TDD is taught the way it is to provide contrast. I show people an example like this to say, “forget all that other stuff–all you need to do is get this one test passing for this one input and just assume that this will be the only input ever, go, now!” TDD is supposed to be fast, and it’s supposed to help you solve just one problem at a time. The fact that returning false won’t work for two isn’t your problem–it’s the problem of you forty-five seconds from now, so there’s no reason for you to bother with it. Live a little–procrastinate!

You refine your algorithm only as the inputs mandate it, and you pick your inputs so as to get the code doing what you want. For instance, after putting in the “return false” and getting the first test passing, it’s pretty apparent that this won’t work for the input “2”. So now you’ve got your next problem–you write the test for 2 and then you set about getting it to pass, say with “return target == 2″. That’s still not great. But it’s better, it was fast, and now your code solves two cases instead of just the one.

Running off the Rails

But there is a tendency I think, as demonstrated by Kristof’s question, for TDD teachers to give the wrong impression. If you were to write a test for 3, “return target == 2″ would pass and you might move on to 4. What do you do at 4? How about “return target == 2 || target == 4;”

So far we’ve been moving in a good direction, but if you take off your “simplest thing” hat for a moment and think about basic programming and math, you can see that we’re starting down a pretty ominous path. After throwing in a 6 and an 8 to the or clause, you might simply decide to use a loop to iterate through all even numbers up to int.MaxValue, or-ing a return value with itself to see if target is any of them.

public bool IsEven(int target)
{
    bool isEven = false;
    for (int index = 0; index < int.MaxValue - 1; index += 2)
        isEven |= target == index;
    return isEven;
}

Yikes! What went wrong? How did we wind up doing something so obtuse following the red-green-refactor principles of TDD? Two considerations, one reason: "simplest" isn't "stupidest."

Simplicity Reconsidered

The first consideration is that simple-complex is not measured on the same scale as stupid-clever. The two have a case-by-case, often interconnected relationship, but simple and stupid aren't the same just as complex and clever aren't the same. So the fact that something is the first thing you think of or the most brainless thing that you think of doesn't mean that it's the simplest thing you could think of. What's the simplest way to get an empty boolean method to return false? "return false;" has no branches and one hardcoded piece of logic. What's the simplest way that you could get a boolean method to return false for 1 and true for 2? "return target == 2" accomplishes the task with a single conditional of incredibly simple math. How about false for 1 and true for 2 and 4? "return target % 2 == 0" accomplishes the task with a single conditional of slightly more involved math. "return target == 2 || target == 4" accomplishes the task with a single conditional containing two clauses (could also be two conditionals). Modulo arithmetic is more elegant/sophisticated, but it is also simpler.

Now, I fully understand the importance in TDD of proceeding methodically and solving problems in cadence. If you can't think of the modulo solution, it's perfectly valid to use the or condition and put in another data point such as testing for IsEven(6). Or perhaps you get all tests passing with the more obtuse solution and then spend the refactor phase refining the algorithm. Certainly nothing wrong with either approach, but at some point you have to make the jump from obtuse to simple, and the real "aha!" moment with TDD comes when you start to recognize the fundamental difference between the two, which is what I'll call the second consideration.

The second consideration is that "simplest" advances an algorithm where "stupidest" does not. To understand what I mean, consider this table:

ConditionalClauseChart

In every case that you add a test, you're adding complexity to the method. This is ultimately not sustainable. You'll never wind up sticking code in production if you need to modify the algorithm every time a new input is sent your way. Well, I shouldn't say never--the Brute Forces are busily cranking these out for you to enjoy on the Daily WTF. But you aren't Brute Force--TDD isn't his style. And because you're not, you need to use either the green or refactor phase to do the simplest possible thing to advance your algorithm.

A great way to do this is to take stock after each cycle, before you write your next failing test and clarify to yourself how you've gotten closer to being done. After the green-refactor, you should be able to note a game-changing refinement. For instance:

MilestoneTddChart

Notice the difference here. In the first two entries, we make real progress. We go from no method to a method and then from a method with one hard-coded value to one that can make the distinction we want for a limited set of values. On the next line, our gains are purely superficial--we grow our limited set from distinguishing between 2 values to 3. That's not good enough, so we can use the refactor cycle to go from our limited set to the complete set.

It might not always be possible to go from limited to complete like that, but you should get somewhere. Maybe you somehow handle all values under 100 or all positive or all negative values. Whatever the case may be, it should cover more ground and be more general. Because really, TDD at its core is a mechanism to help you start with concrete test cases and tease out an algorithm that becomes increasingly generalized.

So please remember that the discipline isn't to do the stupidest or most obtuse thing that works. The discipline is to break a large problem into a series of comparably simple problems that you can solve in sequence without getting ahead of yourself. And this is achieved by simplicity and generalizing--not by brute force.

  • http://twitter.com/DMartin_3 Dan Martin

    Great post! I think people tend to forget about the refactoring phase of TDD sometimes. That’s the perfect opportunity to simplify the code and make it smarter as we now have tests to verify the functionality. But one has to be careful to keep the refactoring simple. Refactoring doesn’t mean deliberately solve the rest of the test cases either.

  • http://www.daedtech.com/blog Erik Dietrich

    That’s a good point. One of the things that it took me a while to learn with TDD was not to get carried away with either getting it to go green or tacking onto the solution during ‘refactoring’.