Should You Aim for 100 Percent Test Coverage?
Editorial note: I originally wrote this post for the NDepend blog. You can check out the original here, at their site. While you’re there, check out all of the different code metrics and rules that NDepend offers.
Test coverage serves as one of the great lightning rods in the world of software development. First, people ask whether it makes for a good metric at all. Then they ask, if you want to use it as a metric, should you go for 100 percent coverage? If not, what percentage should you go for? Maybe 42 percent, since that’s the meaning of life?
I don’t mean to trivialize an important discussion. But sometimes it strikes me that this one could use some trivializing. People dig in and draw battle lines over it, and counterproductive arguments often ensue. It’s strange how fixated people get on this.
I’ll provide my take on the matter here, after a while. But first, I’d like to offer a somewhat more philosophical look at the issue (hopefully without delving into overly abstract navel-gazing along the lines of “What even is a test, anyway, in the greater scheme of life?”)
What Does “Test Coverage” Measure?
First of all, let’s be very clear about what this metric measures. Many in the debate — particularly those on the “less is more” side of it — quickly point out that test coverage does not measure the quality of the tests. “You can have 100 percent coverage with completely worthless tests,” they’ll point out. And they’ll be completely right.
To someone casually consuming this metric, the percentage can easily mislead. After all, 100 percent coverage sounds an awful lot like 100 percent certainty. If you hired me to do some work on your car and I told you that I’d done my work “with 100 percent coverage,” what would you assume? I’m guessing you’d assume that I was 100 percent certain nothing would go wrong and that I invited you to be equally certain. Critics of the total coverage school of thought point to this misunderstanding as a reason not to pursue that level of test coverage. But personally, I just think it’s a reason to clarify definitions.
The Facets of Coverage
Test coverage has two primary concerns of interest to us. First up, what part-to-whole ratio are we measuring with this percent? That’s easy. We’re measuring the percentage of statements in your codebase that your automated test suite executes. So if you have 80 percent coverage on a codebase with one million lines of code, your automated test suite executes 800,000 of those lines while leaving 200,000 untouched. Your whole suite might fail or consist of tests that assert nothing, but you can state definitively that each of those 800,000 lines of code has executed at least once.
The second concern gets less discussion, but I find it equally interesting. Test coverage gives you the percentage of lines of code executed, as measured by the coverage tool. So returning to our million-line codebase with 80 percent coverage, we have 800,000 executed lines as measured by the tool. If you have some form of automated testing that executes the other 200,000 lines without a coverage tool becoming aware of it (say, because you don’t drive it with typical unit tests), you register 80 percent coverage, but you actually cover more with automated tests.
Why get so precise? Because I want to drive home the nuance. When we debate the correct coverage percentage as spit out by a coverage tool, we’re not debating anything overly tangible. We’re really asking “what is the best percentage of statements in our codebase to execute in a way that our coverage tool can count?” Does that really sound as though there’s some magic threshold that will make the difference between good and bad software?
What Are We Really Doing?
Given the qualifiers “percentage of statements executed” and “in a way that a coverage tool can measure,” we start to take our eyes off of the prize. The concept behind these implementation details is both simpler and more laudable. If we shoot for 100 percent coverage in theory, we shoot for two things: (1) having done a dress rehearsal of each and every line in the codebase before shipping it and (2) automating that dress rehearsal for efficiency.
In other words, we want to have an engine that exercises every nook and cranny in our code, on demand. That might not give us 100 percent confidence that every line works perfectly. But it does prevent an amateur-hour situation in which some user is the first person or machine ever to trigger the execution of some instruction in your codebase. And I don’t think anyone would argue that this guarantee isn’t worth pursuing, assuming a reasonable cost.
The Diminishing Returns Argument
But reasonable cost provides the perfect fodder for opponents of the metric and the goal. Typically, the argument explicitly or implicitly references the idea of diminishing returns. To understand this, imagine facing a proposition as a student. You can do nothing for a class and get an F, or you can just show up to lectures and do some of the homework for four hours a week and get a C. You can also put in a studious 10 hours per week to get an A. Or you can really go nuts and spend 25 hours per week earning an A+. For most people, the willingness to expend effort falls somewhere shy of A+. They’ll say that it just isn’t worth all of that extra effort for a slight improvement in results.
This is an economically sound argument — at least for the example of grades. But is it a sound argument for letting your user execute your code before you ever do? I don’t buy it, myself.
Here’s the trouble. If you look at your code from a testability perspective and apply the Pareto Principle, you’ll have superficially plausible argument. Testing 20 percent of our code takes 80 percent of our testing effort, meaning diminishing returns. So let’s just test the 80 percent that lends itself to testability.
That’s rational if you assume the inevitability of testability black holes in your codebase. But that assumption is flawed. You’re basically saying, “I’ve designed this in such a way that it’d be really hard for me to run this before handing it to the user.” The answer at that point shouldn’t be, “Meh, whatever, ship it.” The answer involves examining why you’re designing things in such a way that you can’t execute them before your users do.
Should You Go for 100 Percent Test Coverage?
Early on, I promised to offer my opinion. You’re probably thinking that I’m going to say yes, given that I don’t buy the diminishing returns argument. But I’m not. I don’t think you should try to get some readout from your build that says you have 100 percent test coverage.
When you start without unit tests, the coverage metric can help. You’ll feel a sense of accomplishment as you remedy the situation and go from zero percent to 25 percent to 70 percent. Mark your progress and celebrate your wins.
But as you start getting to the point where you’re debating whether to stop at 85 percent, 95 percent, or 100 percent, you’re starting to ask the wrong question. You’re starting to ask how far you should go to please a metric-generating tool. Instead you should ask how you can justify every line of code you create and how you can prevent a situation in which your users blaze bold new trails through your code. How do you at least come to them with the assurance that you’ve executed all of the code in your codebase? That’s certainly not too much to ask.
Some code is hard to get at from unit tests, such as code that touches the file system or calls out to web services. Maybe you execute that code with a console utility or some other mechanism that your coverage tool doesn’t understand. This certainly beats some tortured, unmaintainable test that mocks out the world and asserts nothing but gets you 100 percent.
But you know what beats both of those things? Making the effort to write code whose execution you can easily automate and whose existence you can easily justify. That takes skill and practice, and it’s always worth doing, no matter what your build report says for the coverage metric. Don’t go for 100 percent coverage. Go for 100 percent testability and 100 percent demonstrable certainty that you’ve tried things before throwing them at your users. That 100 percent coverage mark just proves you’ve done that.