DaedTech

Stories about Software

By

Easy Deployment: the Alpha and the Omega

A bit of housekeeping…you may have noticed that the social media buttons look a bit different if you’re not accessing through RSS. The old plugin that I was using seems not to be supported anymore, and the Facebook button vanished for a bit. I tried out a replacement and liked it, so I kept it. My thanks to Active Bits for the Social Sharing Toolkit.

Wrong But Fast

There’s a pretty good chance that your deployment process is both too painful and not painful enough. But before I return to that cryptic statement, let me talk a bit about something I’ve observed in developers — especially ones that are newer to the industry. Here’s an example of a series of exchanges that has become pretty familiar to me:

User: It would be nice if the profile screen had a way I could change my password.

Young Buck: That’ll take like, literally, two seconds. I’ll be right back!

(Fifteen minutes later)

Young Buck: Okay, I pushed it out to the server, so you can change your password now.

User: Wow! It’s live already? That’s really cool! Thank you! Let me try it out. Let’s see… oh, hmmm. When I try to log in it looks like it crashes.

Young Buck: That doesn’t seem right. I mean, the only thing I changed was…. oh! I know exactly what happened. Give me three minutes, and I’ll be back.

(30 minutes later)

Young Buck: Alright, should be good.

User: Well, I can get in again, and there’s the change password button, but when I click, nothing happens.

Young Buck: That’s not possible. You must have forgotten to clear your cache. Unless…wait, I think I know what’s going on here. Give me 10 minutes.

User: Uh, tell ya what — just don’t worry about it. Maybe I’ll try it again next week.

Young Buck: (Crestfallen) But it’ll only take 10 minutes and I know exactly what the problem is.

User: (With pity) I’m sure you do, but I’ve got a lot of things to get done today.

This is not professional behavior. Imagine if you took your car in for repairs somewhere and things went this way. You’d probably have the Better Business Bureau on the phone shortly or at least be headed to another mechanic. The young buck is sloppy because he’s brash and arrogant. Right?

Or does it just come off that way a little because of how sure he seems when really he’s just eager to please the user and prove himself? Personally, I find that in the overwhelming majority of cases this is really what’s going on. People often get into programming because they like solving problems. And many programmers were some of the smartest kids in their classes growing up — the ones waving their hands frantically, demanding that the teacher give them a chance to show that they know the answer.

FranticStudent

As entry level programmers, the school game has been all that they know. It’s a balance between rushing to get the right answer (teacher calling on students, timed exams, cramming in homework, etc.) and getting answers right, with the former often winning. In college, most programming assignments are evaluated by programs that allow you to submit your code as often as you want and get immediate feedback. There are also office hours, so students who visit professors and TAs the most frequently and submit the most work tend to get the best grades and the most feedback. Computer Science students are used to a paradigm where ideas are valued over execution.

Welcome to the Real World, Grasshopper

In the professional world, however, execution reigns supreme. Ideas are cheap. You may be able to rattle off the quick-sort algorithm in pseudo-code faster than anyone around you, but that’s not going to win you any startup capital. With software, even the intellectual property system (USPTO, anyway) is a joke seemingly designed to let Apple, Microsoft and Google endlessly sue each other and occasionally to swat little guys like bugs. Having the idea first and/or quickly is not as important as getting the idea right in the end.

It takes people some time to learn this upon entering the work force, and exchanges like the one I mention above are common. Users more or less say, “Go away and come back to me when you have something that makes my life better and not a second before. I don’t care if you thought of it in five seconds or if it took you five minutes or if it would have taken you five minutes except that the database was actually not normalized to BCNF and blah, blah, blah. Whatever.” Some people figure this out quickly, and some never figure it out. But whatever the speed may be and however much your group may or may not have come to terms with this, there needs to be structure in place to stop the madness.

In other words, a lot of the developers in your group are going to be eager to please. This is especially true if they regularly interact with their users. There is going to be pressure on them to say things like, “sure, I’ll have that for you in 10 minutes.” But they need not to say things like that. If they can say things like that and they can successfully (attempt to) make them happen, your deployment process is not painful enough.

Hurts so Good

Deployment is not to be taken lightly, especially if there is a release and the users are going to be seeing the result of the work. If you can deploy effortlessly in minutes, there’s a very good chance your process is not painful enough. The situation I’ve been describing above suffers from this very problem — it’s too easy for eager crowd-pleasers to deploy and thus it’s too easy for them to depend on users to be their fast feedback mechanism.

Developers are smart and often opportunistic. Getting fast feedback is extremely important to them, and they’ll naturally seek out ways to procure it. If you let them, they’ll use end users as their feedback mechanism (and, in a tone-deaf sense that ignores end-user perception, this is actually optimal), but you can’t permit this. Rather than following that path of least resistance, or at least familiarity from school/hobbyist days, you need to choke off that path and force them to carve a new riverbed. You need to make deployment more painful.

Now there’s “antiseptic on a cut” painful and “shark gnawing on your leg” painful. You want to gun for the former. A lot of deployment processes that enable developers to SPAM end users with non-functional updates are the product of amateur hour: xcopying files to the server, putting an executable on a share drive, zipping things up and emailing them, etc. These things tend to be both easy to do and easy to botch, so simply setting policies in place that prevent developers from doing them is antiseptic on the cut.

Deployments and especially releases to end users need to follow some process, and coding is simply one stop along the way — not the entirety of it. Ideally, there should be the coding and then developer testing, but from there, automated unit and integration tests, code reviews, static analysis, exploratory/manual testing by QA and observation by a UX group can all be part of the mix. These good practices serve to improve quality in and of themselves, but they also serve to prevent spurious, sloppy releases. If you know you can make a change in five minutes and have it in front of a user in six minutes, you’re going to do that. But if you know that you can make the change in five minutes and it’ll be days of going through the entire release process before you hear back from the end user, you’ll start finding other ways to get fast feedback (such as running unit tests, asking other developers to take a look and working closely with QA). By making deployment more painful you ensure that a lot more care goes into it.

But Hurts Too Much

If you’ve been reading along and grinding your teeth in objection to my premise that your deployment process needs to hurt more, I understand. Things shouldn’t be painful, and deliberately hurting yourself is a form of madness. If your deployment process hurts, it’s too painful, even though I just told you it’s not painful enough. But you have to prevent pain in the right way. Xcopy deployment is like being a boxer addicted to morphine — your process is horrible but pain free. Now imagine that you realize that being addicted to drugs is a problem, so you cut back and start feeling the pain each time a professional puncher unloads on you. In one sense, you’re feeling too much pain because it hurts to get punched in the face, but at the same time you’re not feeling enough pain because the amount you’re feeling isn’t causing you to consider another vocation.

The analogy here may not line up exactly, but the idea is similar. A lot of development and deployment processes are problematically painful in that they’re error-prone and difficult, but not painful enough in that they don’t prevent over-eager deploys and bad decisions. The solution? Get off the junk and stop letting yourself get pummeled by human wrecking balls. Or, in software terms, have a process that makes bad deployments hard and good ones very easy.

Now, getting to this release nirvana is not, itself, simple, but life is simple once you get there. The path to it generally involves a lot of automated testing and good planning. It involves a predictable release cadence, such as a sprint, and a commitment to following the process, not making exceptions nor cramming things in at the 11th hour or pushing back release dates. It involves continuous integration rather than periodic, nightmarish feature branch integrations. It involves a resistance to patching frantically when you make mistakes (you’ll learn a valuable lesson for next time). It involves a simple, fully-automated, easily repeatable build process. It involves a single click/button push deployment process. Summed up, it means that every time someone on your team checks in code, a series of automated tests and checks ensure that the code would be a good candidate for production or it is rejected from checkin until it would be. It means that your code could be shipped with a reasonably high degree of confidence on every checkin and that whether or not to actually ship is a business decision — not a technical one.

I encourage you to take a look at your build process and deployment process. Is it easy because Jim in accounting could do it? If so, it’s too easy and it’s definitely causing you problems. Is it hard enough that you do a lot of checking beforehand because you won’t want to do it again if things go wrong? That’s an improvement because it forces your hand for early testing and vetting, but it’s still a time-wasting problem. First, think about putting obstacles in place to guard against careless deployment, then think about refining those obstacles into process-helping practices, and finally, think about smoothing over the obstacles in the form of complete automation, leaving nothing but good, easy process.

By

Intro to Unit Testing 5: Invading Legacy Code in the Name of Testability

If, in the movie Braveheart, the Scots had been battling a nasty legacy code base instead of the English under Edward Longshanks, the conversation after the battle at Stirling between Wallace and minor Scottish noble MacClannough might have gone like this:

Wallace: We have prevented new bugs in the code base by adding new unit tests for all new code, but bugs will still happen.

MacClannough: What will you do?

Wallace: I will invade the legacy code, and defeat the bugs on their own ground.

MacClannough (snorts in disbelief): Invade? That’s impossible.

Wallace: Why? Why is that impossible? You’re so concerned with squabbling over the best process for handling endless defects that you’ve missed your God-given right to something better.

Braveheart

Goofy as the introduction to this chapter of the series may be, there’s a point here: while unit testing brand new classes that you add to the code base is a victory and brings benefit, to reap the real game-changing rewards you have to be a bit of a rabble-rouser. You can’t just leave that festering mass of legacy code as it is, or it will generate defects even without you touching it. Others may scoff or even outright oppose your efforts, but you’ve got to get that legacy code under test at some point or it will dominate your project and give you unending headaches.

So far in this series, I’ve covered the basics of unit testing, when to do it, and when it might be too daunting. Most recently, I talked about how to design new code to make it testable. This time, I’m going to talk about how to wrangle your existing mess to start making it testable.

Easy Does It

A quick word of caution here before going any further: don’t try to do too much all at once. Your first task after reading the rest of this post should be selecting something small in your code base to try it on if you want to target production and get it approved by an architect or lead, if that’s required. Another option is just to create a playpen version of your codebase to throw away and thus earn yourself a bit more latitude, but either way, I’d advise small, manageable stabs before really bearing down. What specifically you try to do is up to you, but I think it’s worth proceeding slowly and steadily. I’m all about incremental improvement in things that I do.

Also, at the end of this post I’ll offer some further reading that I highly recommend. And, in fact, I recommend reading it before or as you get started working your legacy code toward testability. These books will be a great help and will delve much further into the subjects that I’ll cover here.

Test What You Can

Perhaps this goes without saying, but let’s just say it anyway to be thorough. There will be stuff in the legacy code base you can test. You’ll find the odd class with few dependencies or a method dangling off somewhere that, for a refreshing change, doesn’t reference some giant singleton. So your first task there is writing tests for that code.

But there’s a way to do this and a way not to do this. The way to do it is to write what’s known as characterization tests that simply document the behavior of the existing system. The way not to do this is to introduce ‘corrections’ and cleanup as you go. The linked post goes into more detail, but suffice it to say that modifying untested legacy code is like playing Jenga — you never really know ahead of time which brick removal is going to cause an avalanche of problems. That’s why legacy code is so hard to change and so unpleasant to work with. Adding tests is like adding little warnings that say, “dude, not that brick!!!” So while the tower may be faulty and leaning and of shoddy construction, it is standing and you don’t want to go changing things without putting your warning system in place.

So, long story short, don’t modify — just write tests. Even if a method tells you that it adds two integers and what it really does is divide one by the other, just write a passing test for it. Do not ‘fix’ it (that’ll come later when your tests help you understand the system and renaming the method is a more attractive option). Iterate through your code base and do it everywhere you can. If you can instantiate the class to get to the method you want to test and then write asserts about it (bearing in mind the testability problems I’ve covered like GUI, static state, threading, etc), do it. Move on to the next step once you’ve done the easy stuff everywhere. After all, this is easy practice and practice helps.

Go searching for extractable code

Now that you have a pretty good handle on writing testable code as you’re adding it to the code base and getting untested but testable code under test, it’s time to start chipping away at the rest. One of the easiest ways to do this is to hunt down methods in your code base that you can’t test but not because of the contents in them. Here are two examples that come to mind:

public class Untestable1
{
    public Untestable1()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
    }

    public int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

public class Untestable2
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        Console.WriteLine("Total is " + AddTwoNumbers(order.Subtotal, order.Tax));
    }

    private int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

The first class is untestable because you can’t instantiate it without kicking off global state modification and who knows what else. But the AddTwoNumbers method is imminently testable if you could remove that roadblock. In the second example, the AddTwoNumbers method is testable once again, in theory, but with a roadblock: it’s not public.

In both cases, we have a simple solution: move the method somewhere else. Let’s put it into a class called “BasicArithmeticPerformer” as shown below. I do realize that there are other solutions to make these methods testable, but we’ll talk about them later. And I’ll tell you what I consider to be a terrible solution to one of the testability issues that I’ll talk about now: making the private method public or rigging up your test runner with gimmicks to allow testing of private methods. You’re creating an observer effect with testing when you do this — altering the way the code would look so that you can test it. Don’t compromise your encapsulation design to make things testable. If you find yourself wanting to test what’s going on in private methods, that’s a strong, strong indicator that you’re trying to test the wrong thing or that you have a design flaw.

public class BasicArithmeticPerformer
{
    public int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

Now that’s a testable class. So what do the other classes now look like?

public class Untestable1
{
    public Untestable1()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
    }

    private int AddTwoNumbers(int x, int y)
    {
        return new BasicArithmeticPerformer().AddTwoNumbers(x, y);
    }
}

public class Untestable2
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        Console.WriteLine("Total is " + AddTwoNumbers(order.Subtotal, order.Tax));
    }

    public int AddTwoNumbers(int x, int y)
    {
        return new BasicArithmeticPerformer().AddTwoNumbers(x, y);
    }
}

Yep, it’s that simple. In fact, it has to be that simple. Modifying this untestable legacy code is like walking a high-wire without a safety net, so you have to change as little as possible. Extracting a method to another class is very low risk as far as refactorings go since the most likely problem that could possibly occur (particularly if using an automated tool) is non-compiling. There’s always a risk, but getting legacy code under test is lower risk in the long run than allowing it to continue rotting and the risk of this particular approach is minimal.

On the other side of things, is this a significant win? I would say so. Even ignoring the eliminated duplication, you now have gone from 0 test coverage to 50% in these classes. Test coverage is not a goal in and of itself, but you can now rest a little easier knowing that you have a change warning system in place for half of your code. If someone comes along later and says, “oh, I’ll just change that plus to a minus so that I can ‘reuse’ this method for my purposes,” you’ll have something in place that will throw up a bid red X and say, “hey, you’re breaking things!” And besides, Rome wasn’t built in a day — you’re going to be going through your code base building up a test suite one action like this at a time.

Code that refers to no class fields is easy when it comes to extracting functionality to a safe, testable location. But what if there is instance-level state in the mix? For example…

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

That’s a little tougher because we can’t just pull _someField into a new, testable class. But what if we made a quick change that got us onto more familiar ground? Such as…

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return AddTwoNumbers(x, _someField);
    }

    private int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

Aha! This looks familiar, and I think we know how to get a testable method out of this thing now. In general, when you have class fields or local variables, those are going to become arguments to methods and/or constructors of the new, testable class that you’re creating and instantiating. Understand going in that the more local variables and class fields you have to deal with, the more of a testing headache the thing you’re extracting is going to be. As you go, you’ll learn to look for code in legacy classes that refers to comparably few local variables and especially fields in the current class as a refactoring target, but this is an acquired knack.

The reason this is not especially trivial is that we’re nibbling here at an idea in static analysis of object oriented programs called “cohesion.” Cohesion, explained informally, is the idea that units of code that you find together belong together. For example, a Car class with an instance field called Engine and three methods, StartEngine(), StopEngine( )and RestartEngine() is highly cohesive. All of its methods operate on its field. A class called Car that has an Engine field and a Dishwasher field and two methods, StartEngine() and EmptyDiswasher() is not cohesive. When you go sniping for testable code that you can move to other classes, what you’re really looking for is low cohesion additions to existing classes. Perhaps some class has a method that refers to no instance variables, meaning you could really put it anywhere. Or, perhaps you find a class with three methods that refer to a single instance variable that none of the other 40 methods in a class refer to because they all use some other fields on the class. Those three methods and the field they use could definitely go in another class that you could make testable.

When refactoring toward testability, non-cohesive code is the low-hanging fruit that you’re looking for. If it seems strange that poorly designed code (and non-cohesive code is a characteristic of poor design) offers ripe refactoring opportunities, we’re just making lemonade out of lemons. The fact that someone slammed unrelated pieces of code together to create a franken-class just means that you’re going to have that much easier of a time pulling them apart where they belong.

Realize that Giant Methods are Begging to be Classes

It’s getting less and less common these days, but do you ever see object-oriented code which you can tell that the author meandered his way over to from writing C back in the one-pass compiler days? If you don’t know what I mean, it’s code that has this sort of form:

public void PerformSomeBusinessLogic(CustomerOrder order)
{
    int x, y, z;
    double a, b, c;
    int counter;
    CustomerOrder tempOrder;
    int secondLoopCounter;
    string output;
    string firstTimeInput;
            
    //Alright, now let's get started because this is going to be looooonnnng method...
    ...
}

C programmers wrote code like this because in old standards of C it was necessary to declare variables right after the opening brace of a scope before you started doing things like assignment and control flow statements. They’ve carried it forward over the years because, well, old habits die hard. Interestingly, they’re actually doing you a favor. Here’s why.

When looking at a method like this, you know you’re in for doozy. If it has this many local variables, it’s going to be long, convoluted and painful. In the C# world, it probably has regions in it that divide up the different responsibilities of the method. This is also a problem, but a lemons-to-lemonade opportunity for us. The reason is that these C-style programmers are actually telling you how to turn their giant, unwieldy method into a class. All of those variables at the top? Those are your class fields. All of those regions (or comments in languages that don’t support regioning)? Method names.

In one of the resources I’ll recommend, “Uncle” Bob Martin said something along the lines of “large methods are where classes go to hide.” What this means is that when you encounter some gigantic method that spans dozens or hundreds of lines, what you really have is something that should be a class. It’s functionality that has grown too big for a method. So what do you do? Well, you create a new class with its local variables as fields, its region names/comments as method titles, and class fields as dependencies, and you delegate the responsibility.

public class Untestable4
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        var extractedClass = new MaybeTestable();
        extractedClass.Region1Title();
        extractedClass.Region2Title();
        extractedClass.Region3Title();
    }
}

public class MaybeTestable
{
    int x, y, z;
    double a, b, c;
    int counter;
    CustomerOrder tempOrder;
    int secondLoopCounter;
    string output;
    string firstTimeInput;

    public void Region1Title()
    {

...

In this example, there are no fields in the untestable class that the method is using, but if there were, one way to handle this is to pass them into the constructor of the extracted class and have them as fields there as well. So, assuming this extraction goes smoothly (and it might not be that easy if the giant method has a lot of temporal coupling, resulting from, say, recycled variables), what is gained here? Well, first of all, you’ve slain a giant method, which will inevitably be good from a design perspective. But what about testability?

In this case, it’s possible that you still won’t have testable methods, but it’s likely that you will. The original gigantic method wasn’t testable. They never are. There’s really way too much going on in them for meaningful testing to occur — too many control flow statements, loops, global variables, file I/O, etc. Giant methods are giant because they do a lot of things, and if you do enough code things you’re going to start running over the bounds of testability. But the new methods are going to be split up and more focused and there’s a good chance that at least one of them will be testable in a meaningful way. Plus, with the extracted class, you have control over the new constructor that you’re creating whereas you didn’t with the legacy class, so you can ensure that the class can at least be instantiated. At the end of the day, you’re improving the design and introducing a seam that you can get at for testing.

Ask for your dependencies — don’t declare them

Another change you can make that may be relatively straightforward is to move dependencies out of the scope of your class — especially icky dependencies. Take a look at the original version of Untestable3 again.

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

When instantiated, this class goes and rattles some global state cages, doing God-knows-what (icky), and then retrieves something from global state (icky). We want to get a test around the AddToGlobal method, but we can’t instantiate this class. For all we know, to get the value of “someField” the singleton gets the British Prime Minster on the phone and asks him for a random number between 1 and 1000 — and we can’t automate that in a test suite. Now, the earlier option of extracting code is, of course, viable, but we also have the option of punting the offending code out of this class. (This may or may not be practical depending on where and how this class is used, but let’s assume it is). Say there’s only one client of this code:

public class Untestable3Client
{
    public void SomeMethod()
    {
        var untestable = new Untestable3();
        untestable.AddToGlobal(12);
    }
}

All we really want out of the constructor is a value for “_someField”. All of that stuff with the singleton is just noise. Because of the nature of global variables, we can do the stuff Untestable3’s constructor was doing anywhere. So what about this as an alternative?

public class Untestable3Client
{
    public void SomeMethod()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        var someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
        var untestable = new Untestable3(someField);
        untestable.AddToGlobal(12);
    }
}

public class Untestable3
{
    int _someField;

    public Untestable3(int someField)
    {
        _someField = someField;
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

This new code is going to do the same thing as the old code, but with one important difference: Untestable3 is now a liar. It’s a liar because it’s testable. There’s nothing about global state in there at all. It just takes an integer and stores it, which is no problem to test. You’re an old pro by now at unit testing that’s this easy.

When it comes to testability, the new operator and global state are your enemies. If you have code that makes use of these things, you need to punt. Punt those things out of your code by doing what we did here: executing voids before your constructors/methods are called and asking for things returned from global state or new in your constructors/methods. This is another pretty low-impact way of altering a given class to make it testable, particularly when the only problem is that a class is instantiating untestable classes or reaching out into the global state.

Ruthlessly Eliminate Law of Demeter Violations

If you’re not familiar with the idea, the Law of Demeter, or Principle of Least Knowledge, basically demands that methods refer to as few object instances as possible in order to do their work. You can look at the link for more specifics on what exactly this “law” says, and what exactly is and is not a violation, but the most common form you’ll see is strings of dots (or arrows in C++) where you’re walking an object graph: Property.NestedProperty.NestedNestedProperty.You.Get.The.Idea. (It is worth mentioning that the existence of multiple dots is not always a violation of the Law of Demeter — fluent interfaces in general and Linq in the C# world specifically are counterexamples). It’s when you’re given some object instance and you go picking through its innards to find what you’re looking for.

One of the most immediately memorable ways of thinking about why this is problematic is to consider what happens when you’re at the grocery store buying groceries. When the clerk tells you that the total is $86.28, you swipe your Visa. What you don’t do is wordlessly hand him your wallet. What you definitely don’t do is take off your pants and hand those over so that he can find your wallet. Consider the following code, bearing in mind that example:

public class HardToTest
{
    public string PrepareSsnMessage(CustomerOrder order)
    {
        return "Social Security number is " + order.Customer.PersonalInfo.Ssn;
    }
}

The method in this class just prepends an explanatory string to a social security number. So why on earth do I need something called a customer order? That’s crazy — as crazy as handing the store clerk your pants. And from a testing perspective, this is a real headache. In order to test this method, I have to create a customer, then create an order and hand that to the customer, then create a personal info object and hand that to the customer’s order, and then create an SSN and hand that to the customer’s order’s personal info. And that’s if everything goes well. What if one of those classes — say, Customer — invokes a singleton in its constructor. Well, now I can’t test the “PrepareSsnMessage” in HardToTest because the Customer class uses a singleton. That’s absolutely insane.

Let’s try this instead:

public class HardToTest
{
    public string PrepareSsnMessage(string ssn)
    {
        return "Social Security number is " + ssn;
    }
}

Ah, now that’s easy to test. And we can test it even if the Customer class is doing weird, untestable things because those things aren’t our problem. What about clients, though? They’re used to passing customer orders in, not SSNs. Well, tough — we’re making this class testable. They know about customer order and they its SSN, so let them incur the Law of Demeter violation and figure out how to clean it up. You can only make your code testable one class at a time. That class and its Law of Demeter violation is tomorrow’s project.

When it comes to testing, the more stuff your code knows about, the more setup and potential problems you have. If you don’t test your code, it’s easy to write train wrecks like the “before” method in this section without really considering the ramifications of what you’re doing. The unit tests force you to think about it — “man, this method is a huge hassle to test because problems in classes I don’t even care about are preventing me from testing!” Guess what. That’s a design smell. Problems in weird classes you don’t care about aren’t just impacting your tests — they’re also impacting your class under test, in production, when things go wrong and when you’re trying to debug.

Understand the significance of polymorphism for testing

I’ll leave off with a segue into the next chapter in the series, which is going to be about a concept called “test doubles.” I will explain that concept then and address a significant barrier that you’re probably starting to bump into in your testing travels. But that isn’t my purpose here. For now I’ll just say that you should understand the attraction of using polymorphic code for testing.

Consider the following code:

public class Customer
{
    public string FirstName { get { return TestabilityKiller.Instance.GoGetCustomerFirstNameFromTheDatabase(); } }
}

public class CustomerPropertyFormatter
{
    public string PrepareFirstNameMessage(Customer customer)
    {
        return "Customer first name is " + customer.FirstName;
    }
}

Here you have a class, CustomerPropertyFormatter, that should be pretty easy to test. I mean, it just takes a customer and accesses some string property on it for formatting purposes. But when you actually write a test for this, everything goes wrong. You create a customer to give to your method and your test blows up because of singletons and databases and whatnot. You can write a test with a null argument and amend this code to handle null gracefully, but that’s about it.

But, never fear — polymorphism to the rescue. If you make a relatively small modification to the Customer class, you set yourself up nicely. All you have to do is make the FirstName property virtual. Once you’ve done that, here’s a unit test that you can write:

public class DummyCustomer : Customer
{
    private string _firstName;
    public override string FirstName { get { return _firstName; } }

    /// 

    /// Initializes a new instance of the DummyCustomer class.
    ///

public DummyCustomer(string firstName) { _firstName = firstName; } } [TestMethod, Owner(“ebd”), TestCategory(“Proven”), TestCategory(“Unit”)] public void Adds_Text_To_FirstName() { string firstName = “Erik”; var customer = new DummyCustomer(firstName); var formatter = new CustomerPropertyFormatter(); Assert.IsTrue(formatter.PrepareFirstNameMessage(customer).Contains(firstName)); }

Notice that there is a class, DummyCustomer declared inside of the test class that inherits from the Customer class. DummyCustomer is an example of a test double. You’ll notice that I’ve created a scenario here where I define a version of FirstName that I can control — a benign version, if you will. I effectively bypass that database-singleton thing and create a version of the class that exists only in the test project and allows me to substitute a simple, friendly value that I can test against.

As I said, I’ll dive much more into test doubles next time, but for the time being, understand the power of polymorphism for testability. If the legacy code has methods in it that are hard to use, you can create much more testable situations by the use of interface implementation, inheritance, and the virtual keyword. Conversely, you can make testing a nightmare by using keywords like final and sealed (Java and C# respectively). There are valid reasons to use these, but if you want a testable code base, you should favor liberal support of inheritance and interface implementation.

A Note of Caution

In the sections above, I’ve talked about refactorings that you can do on legacy code bases and mentioned that there is some risk associated with doing so. It is up to you to assess the level of risk of touching your legacy code, but know that any changes you make to legacy code without first instrumenting unit tests can be breaking changes, even small ones guided by automated refactoring tools. There are ways to ‘cheat’ and tips and techniques to get a method under test before you refactor it, such as temporarily making private fields public or local variables into public fields. The Michael Feathers book below talks extensively about these techniques to truly minimize the risk.

The techniques that I’m suggesting here would be ones that I’d typically undertake when requirements changes or bugs were forcing me to make a bunch of changes to the legacy code anyway, and the business understood and was willing to undertake the risk of changing it. I tend to refactor opportunistically like that. What you do is really up to your discretion, but I don’t want to be responsible for you doing some rogue refactoring and torpedoing your production code because you thought it was safe. Changing untested legacy code is never safe, and it’s important for you to understand the risks.

More Information

As mentioned earlier, here are some excellent resources for more information on working with and testing legacy code bases:

And, of course, you can check out my book about unit testing: Starting to Unit Test, Not as Hard as You Think.

By

Slow Posting Week

This is just a brief announcement that I’m probably not going to be posting this coming week — I forgot to mention it last week. Readers in the USA and perhaps others know that Thursday was US Independence Day, so between the long weekend for me and the fact that I’m currently out of town (pleasure, not business) until late next week, I’m not really planning to write any blog posts. All I have with me are an Android tablet and an Ubuntu Netbook Remix machine, neither of which has a serious IDE, so if I do get time it’ll probably be philosophical more than technical.

Assuming that the blog is quiet this coming week, there is still stuff to look forward to. I’m continuing to write for the unit testing series, and the Expert Beginner E-Book is in its last phase before release — illustrations and format cleanup. So stay tuned for those and other random thoughts.

By

Proposal: A Law of Performance Citation

I anticipate this post being fairly controversial, though that’s not my intention. I imagine that if it wanders its way onto r/programming it will receive a lot of votes and zero points as supporters and detractors engage in a furious, evenly-matched arm-wrestling standoff with upvotes and downvotes. Or maybe three people will read this and none of them will care. It turns out that I’m actually terrible at predicting which posts will be popular and/or high-traffic. And I’ll try to avoid couching this as flame-bait because I think I actually have a fairly non-controversial point on a potentially controversial subject.

ArmWrestling

To get right down to it, the Law of Performance Citation that I propose is this:

If you’re going to cite performance considerations as the reason your code looks the way it does, you need to justify it by describing how a stakeholder will be affected.

By way of an example, consider a situation I encountered some years back. I was pitching in to help out with a bit of programming for someone when I was light on work, and the task I was given amounted to “copy-paste-and-adjust-to-taste.” This was the first red flag, but hey, not my project or decision, so I took the “template code” I was given and made the best of it. The author gave me code containing, among other things, a method that looked more or less like this (obfuscated and simplified for example purposes):

public void SomeMethod()
{
    bool isSomethingAboutAFooTrue = false;
    bool isSomethingElseAboutAFooTrue = false;
    IEnumerable foos = ReadFoosFromAFile();
    for (int i = 0; i < foos.Count(); i++)
    {
        var foo = foos.ElementAt(i);
        if (IsSomethingAboutAFooTrue(foo))
        {
            isSomethingAboutAFooTrue = true;
        }
        if (IsSomethingElseAboutAFooTrue(foo))
        {
            isSomethingElseAboutAFooTrue = true;
        }
        if (isSomethingAboutAFooTrue && isSomethingElseAboutAFooTrue)
        {
            break;
        }
    }

    WriteToADatabase(isSomethingAboutAFooTrue, isSomethingElseAboutAFooTrue);
}

I promptly changed it to one that looked like this for my version of the implementation:

public void SomeMethodRefactored()
{
    var foos = ReadFoosFromAFile();

    bool isSomethingAboutOneOfTheFoosTrue = foos.Any(foo => IsSomethingAboutAFooTrue(foo));
    bool isSomethingElseABoutOneOfTheFoosTrue = foos.Any(foo => IsSomethingElseAboutAFooTrue(foo));

    WriteToADatabase(isSomethingAboutOneOfTheFoosTrue, isSomethingElseABoutOneOfTheFoosTrue);
}

I checked this in as my new code (I wasn't changing his existing code) and thought, "he'll probably see this and retrofit it to his old stuff once he sees how cool the functional/Linq approach is." I had flattened a bunch of clunky looping logic into a compact, highly-readable method, and I found this code to be much easier to reason about and understand. But I turned out to be wrong about his reaction.

When I checked on the code the next day, I saw that my version had been replaced by a version that mirrored the original one and didn't take advantage of even the keyword foreach, to say nothing of Linq. Bemused, I asked my colleague what had prompted this change and he told me that it was important not to process the foos in the collection a second time if it wasn't necessary and that my code was inefficient. He also told me, for good measure, that I shouldn't use var because "strong typing is better."

I stifled a chuckle and ignored the var comment and went back to look at the code in more detail, fearful that I'd missed something. But no, not really. The method about reading from a file read in the entire foo collection from the file (this method was in another assembly and not mine to modify anyway), and the average number of foos was single digits. The foos were pretty lightweight objects once read in, and the methods evaluating them were minimal and straightforward.

Was this guy seriously suggesting that possibly walking an extra eight or nine foos in memory, worst case, sandwiched between a file read over the network and a database write over the network was a problem? Was he suggesting that it was worth a bunch of extra lines of confusing flag-based code? The answer, apparently, was "yes" and "yes."

But actually, I don't think there was an answer to either of those questions in reality because I strongly suspect that these questions never crossed his mind. I suspect that what happened instead was that he looked at the code, didn't like that I had changed it, and looked quickly and superficially for a reason to revert it. I don't think that during this 'performance analysis' any thought was given to how external I/O over a network was many orders of magnitude more expensive than the savings, much less any thought of a time trial or O-notation analysis of the code. It seemed more like hand-waving.

It's an easy thing to do. I've seen it time and again throughout my career and in discussing code with others. People make vague, passing references to "performance considerations" and use these as justifications for code-related decisions. Performance and resource consumption are considerations that are very hard to reason about before run-time. If they weren't, there wouldn't be college-level discrete math courses dedicated to algorithm runtime analysis. And because it's hard to reason about, it becomes so nuanced and subjective in these sorts of discussions that right and wrong are matters of opinion and it's all really relative. Arguing about runtime performance is like arguing about which stocks are going to be profitable, who is going to win the next Super Bowl, or whether this is going to be a hot summer. Everyone is an expert and everyone has an opinion, but those opinions amount to guesses until actual events play out for observation.

Don't get me wrong -- I'm not saying that it isn't possible to know by compile-time inspection whether a loop will terminate early or not, depending on the input. What I'm talking about is how code will run in complex environments with countless unpredictable factors and whether any of these considerations have an adverse impact on system stakeholders. For instance, in the example here, the (more compact, maintainable) code that I wrote appears that it will perform ever-so-slightly worse than the code it replaced. But no user will notice losing a few hundred nano-seconds between operations that each take seconds. And what's going on under the hood? What optimizations and magic does the compiler perform on each of the pieces of code we write? What does the .NET framework do in terms of caching or optimization at runtime? How about the database or the file read/write API?

Can you honestly say that you know without a lot of research or without running the code and doing actual time trials? If you do, your knowledge is far more encyclopedic than mine and that of the overwhelming majority of programmers. But even if you say you do, I'd like to see some time trials just the same. No offense. And even time trials aren't really sufficient because they might only demonstrate that your version of the code shaves a few microseconds off of a non-critical process running headlessly once a week somewhere. It's for this reason that I feel like this 'law' that I'm proposing should be a thing.

Caveats

First off, I'm not saying that one shouldn't bear efficiency in mind when coding or that one should deliberately write slow or inefficient code. What I'm really getting at here is that we should be writing clear, maintainable, communicative and, above all, correct code as a top priority. When those traits are established, we can worry about how the code runs -- and only then if we can demonstrate that a user's or stakeholder's experience would be improved by worrying about it.

Secondly, I'm aware of the aphorism that "premature optimization is the root of all evil." This is a little broader and less strident about avoiding optimization. (I'm not actually sure that I agree about premature optimization, and I'd probably opt for knowledge duplication in a system as the root of all evil, if I were picking one.) I'm talking about how one justifies code more than how one goes about writing it. I think it's time for us to call people out (politely) when they wave off criticism about some gigantic, dense, flag-ridden method with assurances that it "performs better in production." Prove it, and show me who benefits from it. Talk is cheap, and I can easily show you who loses when you write code like that (hint: any maintenance programmer, including you).

Finally, if you are citing performance reasons and you're right, then please just take the time to explain the issue to those to whom you're talking. This might include someone writing clean-looking but inefficient code or someone writing ugly, inefficient code. You can make a stakeholder-interest case, so please spend a few minutes doing it. People will learn something from you. And here's a bit of subtlety: that case can include saying something like, "it won't actually affect the users in this particular method, but this inefficient approach seems to be a pattern of yours and it may well affect stakeholders the next time you do it." In my mind, correcting/pointing out an ipso facto inefficient programming practice of a colleague, like hand-writing bubble sorts everywhere, definitely has a business case.

By

I Don’t Know What X is on Line 48 And I Don’t Care

I was at a users’ group recently here in Chicago, and there were two excellent presenters. There were two very well done presentations: one about Xamarin and one about the C# language entitled “Underestimated C# Language Features” by John Michael Hauck. This was a very polished and appealing talk in which he wove together some important differentiators for the C# language, including closures, anonymous methods, and deferred execution. Given that he’s presented this at some conferences as well, I wasn’t surprised by the high caliber of the presentation.

The format was one in which he presented a series of increasingly difficult problems and asked what different variables were at different points of the program’s execution. For example, here’s something I just made up that would have looked at home in one of his many examples:

public void LetsSeeWhatHappens()
{
    int x = 100;
    Console.WriteLine(x);
    foreach (var f in GetSomeFuncs())
    {
        x++;
        Console.WriteLine(f(++x));
    }
    Console.WriteLine(x);
}

public IEnumerable> GetSomeFuncs()
{
    for (int index = 0; index < 3; index++)
        yield return i => index * i;
}

With each example like this, he’d ask the audience what they thought would be the output. And then the fun would begin. There’d always be several different answers and heated disagreement. After all, bragging rights and programmer cred was at stake. This was public gamification at its finest, and it reminded me vaguely of being in a sports bar and listening to people debating heatedly what the next play call would be in the Monday night football game:

Person 1: Third and 7. They have to pass!
Person 2: I think they’re going to call a draw.
Person 3: You’re nuts! That’s stupid! It’ll be a screen pass!
Person 1: Come on, you’ve been wrong every time!!!

FootballGame

I was entertained by this. John was an engaging presenter and the material interested me, but I discovered I had no interest in actually guessing, even to myself, to say nothing of out loud. At one point, a friend of mine that I was sitting with said, “who knows — write a unit test to figure it out,” and I certainly agreed with that point. But that wasn’t it. I think it was just that I was more interested in what the answer would teach me about the language and its nuance than I was in somehow testing myself. Frankly, this seemed like trivia to me and getting the right answer didn’t seem as important as taking the opportunity to hone my critical thinking skills.

Figuring this out, I realized it explained why I was impatient with all of the guessing that people were doing. John would ask people what they thought, and they would guess but then also start explaining their reasoning and arguing with one another. This struck me as boring and inefficient. “Just shut up and let him hit F10, and we’ll have the answer,” I thought. In that moment, I also realized that a very mild pet peeve of mine has always been code reviews or other places where people argue over runtime behavior. I think to myself, “You know what’s better than all of us at being the .NET runtime? The .NET runtime! So let’s ask it.”

After the presentation, I went out for a bite to eat with some friends. There, one of them made an excellent point. When we were discussing the idea of sitting around and figuring out what code did, he said (paraphrased), “if I show code to a handful of competent developers and they can’t agree on what it will do at runtime, then that code needs to be re-written.” I thought this was perfect and couldn’t agree more. The combination of letting the runtime stick to figuring out what things will be at runtime and the idea that well-written code shouldn’t make this a mystery really sort of drove home why this whole exercised seemed to me (and really is) purely an academic thought-drill.

By all means, exercise your brain and solve riddles involving programming. But I don’t think that this type of activity should be the centerpiece of work-place conversations, evaluations of code, or especially interviews. If I’m interviewing people and asking them what X is on line 48, before even the guy who gets it right, I’m going to hire the guy that says, “let’s write a unit test and find out.”