Are Unit Tests Worth It?
The Unit Test Value Proposition
I gave a presentation yesterday on integrating unit tests into a build. (If anyone is interested in seeing it, feel free to leave a comment, and I’ll post relevant slides to slideshare or perhaps make the power point available for download). This covered the nuts and bolts of how I had added test running to the build machine as well as how to verify that a delivery wouldn’t cause unit test failures and thus break the build. For background, I presented some statistics about unit testing and the motivations for a test-guarded integration scheme.
One of the questions that came up during Q&A was from a bright woman in the audience who asked what percentage of development time was spent writing unit tests for an experienced test writer and for a novice writer. My response to this was that it would be somewhat slower going at first, but that an experienced TDD developer was just as fast doing both as a non-testing developer in the short term and faster in the long term (less debugging and defect fixing). From my own personal experience, this is the case.
She then asked a follow up question about what kind of reduction in defects it brought, and I saw exactly what she was driving at. This is why I mentioned that she is an intelligent woman. She was looking for a snap-calculation as to whether or not this was a good proposition and worth adopting. She wanted to know exactly how many defects would be avoided by x “extra” days of effort. If 5 days of testing saved 6 days of fixing defects, this would be worth her time. Otherwise, it wouldn’t.
An Understandable but Misguided Assessment
In the flow of my presentation (which wasn’t primarily about the merits of unit testing, but rather how not to break the build), I missed an opportunity to make a valuable point. I wasn’t pressing and trying to convince people to test if they didn’t want to. I was just trying to show people how to run the tests that did exist so as not to break the build.
Let’s consider what’s really being asked here. She’s driving at an underlying narrative roughly as follows (picking arbitrary percentages):
My normal process is to develop software that is 80% correct and 20% incorrect and declare it to be done. The 80% of satisfied requirements are my development, and the 20% of missed requirements/regressions/problems is part of a QA phase. Let’s say that I spend a month getting to this 80/20 split and then 2 weeks getting the rest up to snuff, for a total of 6 weeks of effort. If I can add unit testing and deliver a 100/0 split, but only after 7 weeks then the unit testing isn’t worthwhile, but if I can get the 100/0 split in under 6 weeks, then this is something that I should do.
Perfectly logical, right?
Well, almost. The part not factored in here is that declaring software to be done when it’s 80% right is not accurate. It isn’t done. It’s 80% done and 20% defective. But, it’s being represented as 100% done to external stakeholders, and then tossed over the fence to QA with the rider that “this is ‘done’, but it’s not done-done. And now, it’s your job to help me finish my work.”
So, there’s a hidden cost here. It isn’t the straightforward value proposition that can be so easily calculated. It isn’t just our time as developers — we’re now roping external stakeholders into helping us finish by telling them that we’ve completed our work, and that they should use the product as if it were reliable when it isn’t. This isn’t like submitting a book to an editor and having them perform quality assurance on it. In that scenario, the editor’s job is to find typos and your job is to nail down the content. In the development/QA work, your job is to ensure that your classes (units) do what you think they should, and it’s QA’s job to find integration problems, instances of misunderstood requirements, and other user-test type things. It’s not QA’s job to discover an unhandled exception where you didn’t check a method parameter for null — that’s your job. And, if you have problems like that in 20% of your code, you’re wasting at least two people’s time for the price of one.
Efficiency: Making More Mistakes in Less Time
Putting a number to this in terms of “if x is greater than y, then unit testing is a good idea” is murkier than it seems because of the waste of others’ time. It gets murkier still when concepts like technical debt and stakeholder trust of developers are factored in. Tested code tends to be a source of less technical debt given that it’s usually more modular, maintainable, flexible, etc. Tested code tends to inspire more confidence in collaborators as, you may run a little behind schedule here and there, but when things are delivered, they work.
On the flipside of that, you get into the proverbial software death march, particularly in less agile shops. Some drop-dead date is imposed for feature complete, and you frantically write duct-tape software up until that date, and then chuck whatever code grenade you’re holding over the QA wall and hope the shrapnel doesn’t blow back too hard on you. The actual quality of the software is a complete mystery and it may not be remotely close to shippable. It almost certainly won’t be something you’re proud to be associated with.
One of my favorite lines in one of my favorite shows, The Simpsons, comes from the Homer character. In an episode, he randomly decides to change his name to Max Power and assume a more go-getter kind of identity. At one point, he tells his children, “there are three ways of doing things: the right way, the wrong way, and the Max Power way.” Bart responds by saying, “Isn’t that just the wrong way?” to which Homer (Max Power) replies, “yes, but faster!”
That’s a much better description of the “value” proposition here. It’s akin to being a student and saying “It’s much more efficient to get grades of C and D because I can put in 10 hours per week of effort to do that, versus 40 hours per week to get As.” In a narrow sense that’s true, but in the broader sense of efficiency at being a good student, it’s a very unfortunate perspective. The same kind of nuanced perspective holds in software development. Sacrificing an objective, early-feedback quality mechanism such as unit tests in the interests of being more “efficient” just means that you’re making mistakes more efficiently. And, getting more things wrong in the same amount of time is a process bug — not a feature.
So, for my money, the idea of making a calculation as to whether or not verifying your work is worthwhile misses the point. Getting the software right is going to take you some amount of time X. You have two options here. The first option is to spend some fraction of X working and then claim to be finished when you’re not, at which point you’ll spend the other portion of the fraction “fixing” the fact that you didn’t finish. The second option is to spend the full time X getting it right.
If you set a standard for yourself that you’re only going to deliver correct software, the timelines work themselves out. If you have a development iteration that will take you 6 weeks to get right, and the business tells you that you only get 4, you can either deliver them “all” of what they want in 4 weeks with the caveat that it’s 33% defective, or you can say “well, I can’t do that for you, but if you pick this subset of features, I’ll deliver them flawlessly.” Any management that would rather have the “complete” software with defect landmines littering 33% of the codebase than 2/3rds of the features done right needs to do some serious soul-searching. It’s easy to sell excellent software with the most important 2/3rds of the features and the remaining third two weeks out. It’s hard to sell crap at any point in time.
So, the real value proposition here boils down only to “do I want to be adept at writing unreliable software or do I want to be adept at writing software that inspires trust?”