DaedTech

Stories about Software

By

How to Actually Reduce Software Defects

Editorial Note: I originally wrote this post for the SmartBear blog.  You can check out the original here, at their site.  Have a look around while you’re there and see what some of the other authors have written.

As an IT management consultant, probably the most frequent question I hear is some variant of “how can we get our defect count down?” Developers may want this as a matter of professional pride, but it’s the managers and project managers that truly burn to improve on this metric. Our software does thousands of undesirable things in production, and we’d like to get that down to hundreds.

Fumigation

Almost invariably, they’re looking for a percentage reduction, presumably because there is some sort of performance incentive based on the defect count metric. And so they want strategies for reducing defects by some percentage, in the same way that the president of the United States might challenge his cabinet to trim 2% of the unemployment percentage in the coming years. The trouble is, though, that this attitude toward defects is actually part of the problem.

The Right Attitude toward Defects

The president sets a goal of reducing unemployment, but not of eliminating it. Why is that? Well, because having nobody in the country unemployed is simply impossible outside of a planned economy – people will quit and take time off between jobs or get laid off and have to spend time searching for new ones. Some unemployment is inevitable.

Management, particularly in traditional, ‘waterfall’ shops, tends to view defects in the same light. We clearly can’t avoid defects, but if we worked really hard, we could reduce them by half. This attitude is a core part of the problem.

It’s often met with initial skepticism, but what I tell these clients is that they should shoot for having no escaped defects (defects that make it to production, as opposed to ones that are caught by the team during testing). In other words, don’t shoot for a 20% or 50% reduction – shoot for not having defects.

It’s not that shooting for 100% will stretch teams further than shooting for 20% or 50%. There’s no psychological gimmickry to it. Instead, it’s about ceasing to view defects as “just part of writing software.” Defects are not inevitable, and coming to view them as preventable mistakes rather than facts of life is important because it leads to a reaction of “oh, wow, a defect – that’s bad, let’s figure out how that happened and fix it” instead of a reaction of “yeah, defects, what are you gonna do?”

When teams realize and accept this, they turn an important corner on the road to defect reduction.

What Won’t Help

Once the mission is properly set to one of defect elimination, it’s important to understand what either won’t help at all or what will help only superficially. And this set includes a lot of the familiar levers that dev managers like to pull.

First and probably most critical to understand is that the core cause of defects is NOT developers not trying hard enough or taking care. In other words, it’s not as though a developer is sitting at his desk and thinking, “I could make this code I’m writing defect free, but, meh, I don’t feel like it because I want to go home.” It is precisely for this reason that exhortations for developers to work harder or to be more careful won’t work. They already are, assuming they aren’t overworked or unhappy with their jobs, and if those things are true, asking for more won’t work anyway.

Clipboard

And, speaking of overworked, increasing workload in a push to get defect free will backfire. When people are forced to work long hours, the work becomes boring and “grueling and boring” is a breeding ground for mistakes – not a fix for them. Resist the urge to make large, effort-intensive quality pushes. That solution should seem too easy, and, in fact, it is.

Finally, resist any impulse to forgo the carrot in favor of the stick and threaten developers and teams with consequences for defects. This is a desperate gambit, and, simply put, it never works. If developers’ jobs depend on not introducing defects, they will find a way to succeed in not introducing defects, even if it means not shipping software, cutting scope, or transferring to other teams/projects. The road to quality isn’t lined by fear.

Understand Superficial Solutions

Once managers understand that eliminating defects is possible and that draconian measures will be counterproductive, the next danger is a tendency to seize on the superficial. Unlike the ideas in the last section, these won’t be actively detrimental, but the realized gains will be limited.

The first thing that everyone seems to seize on is mandating unit test coverage, since this, presumably, forces the developers to write automated tests, which, in turn, catch issues. The trouble here is that high coverage doesn’t actually mean that the tests are effective, nor does it cover all possible defect scenarios. Hiring or logging additional QA hours will be of limited efficacy for similar reasons.

Another thing folks seem to love is the “bug bash” concept, wherein the team takes a break from delivering features and does their best to break the software and then repair the breaks. While this certainly helps in the short term, it doesn’t actually change anything about the development or testing process, so gains will be limited.

And finally, coding standards to be enforced at code review certainly don’t hurt anything, but they are also not a game changer. To the chagrin of managers everywhere, “here are all the mistakes one could make, so don’t make them” doesn’t arise from the past experience of the tenured developers on the team.

Change the Game

So what does it take to put a serious dent into defect counts and to fundamentally alter the organization’s views about defects? The answers here are more philosophical.

The first consideration is to get integration to be continuous and to make deployments to test and production environments trivial. Defects hide and fester in the speculative world between written code and the environment in which it will eventually be run. If, on the other hand, developers see the effects their code will have on production immediately, the defect count will plummet.

Part and parcel with this tight feedback loop strategy is to have an automated regression and problem detection suite. Notice that I’m not talking about test coverage or even unit tests, but about a broader concept. Your suite will include these things, but it might also include smoke/performance tests or tests to see if resources are starved. The idea is to have automated detection for things that could go wrong: regressions, integration mistakes, performance issues, etc. These will allow you to discover defects instead of customers discovering them.

And, finally, on the code side, you need to reduce or eliminate error prone practices and parts of the code. Is there a file that’s constantly being merged and could lead to errors? Do your developers copy, paste, and tweak? Are there config files that require a lot of careful, confusing attention to detail? Recognize these mistake-inviters for what they are and eliminate them.

But here’s the thing – I can’t possibly enumerate all of the tools in your arsenal. These are some of my most tried and true strategies, but you’ll have to figure what works for you. The key is to recognize that defects are not inevitable and go from there.

Add a Comment

47 Comments on "How to Actually Reduce Software Defects"

Notify of
avatar
Sort by:   newest | oldest | most voted
Ross Johnson
Guest
I largely agree with this, and definitely agree with the sentiment. However, I think a grey-area here is the definition of defect. In my experience, once you’re running a fairly tight ship with good continuous integration, many of your “defects” boil down to a misunderstanding of requirements. This could be bad communication from the business, poorly documenting the requirement, or a simple misreading of the same, or some combination of all three. In any case, it’s not what the customer wanted so it’s a classified as a bug. But in these cases, no amount of automated testing is going to… Read more »
Erik Dietrich
Guest

When I think of defects in the sense I was talking about here, I think in terms of the software doing things that the dev team didn’t intend. So that wouldn’t include requirements misunderstandings, since then the software is still behaving as the team expects.

Generally speaking, I think of that sort of miscommunication as being a different (though no less serious) problem than the one I’m describing here.

Dan Olesen
Guest
Hi Ross – that is an insightful observation; “defects” boil down to a misunderstanding of requirements. From my many years in software development, I’ve concluded the same … So the question is; what’s the cure ??? Since all projects are unique in many aspects, it’s necessary to look at what these projects have in common. (From a philosophical point of view; “if they all have bugs to cope with, the root cause is most likely in what they have in common …”) The common part are the requirements, and the way they get misunderstood. This is not a problem with… Read more »
Sam Younon
Guest

I like this article.

Sam Younon
Guest

one minor nit: “These will allow you to discover defects instead of customers.” — discovering customers is usually a good thing (i.e., you might want to reword that?)

Erik Dietrich
Guest

lol… good call. I fixed it.

Batroid
Guest

These will allow you to discover defects instead of “the” customers… Bug are every where.

Erik Dietrich
Guest

Thanks!

Kirk W
Guest

We have found that doing code reviews of the specific fixes being applied has helped us to see what the problems are, and have helped to clean up a whole bunch of other errors, as well as reminding developers to get questions answered before making some of the changes requested. (The majority of errors were side-effects of requested code changes. That process had to be reviewed. The developer was not brought in soon enough to ask the correct questions and was being forced to make assumptions)

Erik Dietrich
Guest

Is it fair to say that the majority of errors were regressions, then? Did the code in question have some kind of automated test suite, or do the reviewers serve in this capacity?

Kirk W
Guest

Erik, most of our errors were NOT regressions. They were side-effect errors from recent changes. Sorry for the lack of clarity. Fix A, introduce A’ as a smaller, minor issue (some place on the screen not updated. Or this needed to be on screen 16 as well. More a lack of thoroughness in definition of the issue we are addressing.)

Not issues with this coming back to life after being fixed.

nordlyst
Guest
My experience with code reviews (by peers) is also very positive. I don’t know if the defect rate was much affected, but I am convinced it would in the long run. We emphasized that code should be clear in our reviews – meaning that it should reveal the intentions of its author as clearly as possible and it should be as easy as possible to understand what the code does. Both of these things contribute to making all of the undocumented details of the inner workings of the system more accessible to the people who come after to maintain and… Read more »
Dominic Amann
Guest

I am with Erik on his thesis, and I might suggest that most of the typical agile practices merely allow us to make mistakes faster (and fix them faster), and his approach which is better and faster continuous integration is another way of fixing the mistakes we make faster, faster – and with higher visibility.

Erik Dietrich
Guest

Well said, thanks. I don’t believe I specifically articulated this in the post, but so much of *escaped* defect prevention isn’t about trying to avoid all mistakes. It’s about creating a working paradigm where you discover them fast, early, and automatically, before people outside of the team are ever exposed to them. CI and a lot of craftsmanship/agile practices seem to drive at speeding up the feedback loop for just this purpose.

NitzMan
Guest

With regard to the side effects, having unit tests as part of your continuous integration can help remove these. I’d rather have a build fail because a test didn’t pass than search for possible side effects manually.

Joel McIntyre
Guest
Another systematic successful way is to get more brains on the problem, especially ones with different perspectives and thinking styles. Programmers are “make it work” kind of people; get some “make it break” and some “use it like a customer” people, make sure to have big picture and detail oriented people taking a look. When a test team works, it’s because it is accomplishing this, but you can do it without “testers” and you can have testers and fail to do this. For example, I’ve seen success when you get the PM/designer role involved in an acceptance sign-off pass. This… Read more »
Erik Dietrich
Guest
I personally think that putting different personas into the mix and giving them an interest in the product is a great idea. It sounds like you have a good recipe for success. In shops I’ve consulted with over the last several years, I’ve been part of efforts that install this as something, in the agile community, called “three amigos” discussions. Developer, business analyst and QA person all get together ahead of implementing a feature, agree on what success looks like up front, and design some kind of acceptance test. Agreement on done is then academic. The nice thing about automating… Read more »
Joel McIntyre
Guest

That’s fantastic, I like the sound of that! “Three amigos” – I’ll have to follow up on that, thanks!

JerryGirard
Guest

Where is the word “design” in this article? Before I even read the article, I did a text search for the word “design”. I found none, so I did not bother to read the article.

Erik Dietrich
Guest

That’s an interesting algorithm.

JohnAdams_1796
Guest

Just because the article does not use the word ‘design’ does not mean it does not address design. As you must know as a software developer, all of the measures discussed in this article are part of the design process. The design of a software system is not complete until the last code change is tested and verified.

nordlyst
Guest
Well, I read the whole thing and apart from setting no defects as the goal I can’t really say what the author thinks I ought to do! The complete lack of any discussion on software design is a huge omission. Even if one isn’t designing new features from scratch, design decisions are being made whenever one decides to make changes to software. And it is the decisions one makes here more than implementing them that tends to cause errors – especially the errors that are likely to escape early detection. This is to be expected, since testing features is always… Read more »
Erik Dietrich
Guest
Regarding what you ought to do, the thesis of the piece was that any individual software team participant should consider shifting mindset to one that doesn’t regard defects as inevitable. It’s hard for me to speak to the comparably nebulous concept of “design” since my position would be that this is not a different activity from implementation, except, perhaps in a phased development approach (requirements phase, design phase, implementation phase, testing phase). I’m not saying this to be evasive, but because I honestly don’t know where the line is drawn in any given shop. Is “design” a question of how… Read more »
nordlyst
Guest
Fair enough. I actually did get that it was a matter of mindset, not merely goals. Even so, I don’t think it told me very much about what to concretely do. Re: design. Yes. Design is ALL of those things! It is a mistake in my opinion to think that just because there is no formal design phase in agile it isn’t worth making a distinction between design and implementation. Every decision you make about what exactly the software should do is ultimately a design decision. Implementation is ideally reduced to actually typing the code, and perhaps choosing how to… Read more »
Erik Dietrich
Guest
I’m legitimately interested in the distinction between design and implementation. That is, I view it as nebulous because I don’t know where one begins and the other ends. But that’s not to say clear (and useful) distinctions couldn’t be made. It sounds as though (and correct me if I’m wrong) the distinction for you is that conceiving of all moving parts, such as the algorithms, architecture, object graph, iterations, etc, constitute design. The actual typing out of code then is implementation? Anyway, what I was reacting to was your observation that I didn’t mention design when I wrote the post.… Read more »
Kevin
Guest

@Erik,

The reason I didn’t is the same reason I never mentioned the word “implementation,” either. I view both as such integral parts of creating software that mentioning them (to me) would have seemed redundant.

Aren’t you falling into your own trap. You have made assumptions here that, clearly, are not universally true. If you want to generate defects that slip through the net, base your design/development on assumptions. Communications, ie. validation in this case, is a two way street and continuous.

Mark Saleski
Guest

The mindset of not thinking that defects are inevitable is perfectly laudible, but if in the development lifecycle, sufficient effort isn’t put into both detailed requirements and how those requirements are translated into designs…the mindset down at the coding end is doomed to failure.

You can write the cleanest, most elegant code in the world…but if it’s not a correct implementation of the original intent then defects will certainly follow.

Erik Dietrich
Guest
I agree with this sentiment. I’d also extend it to say that one can build software with no unexpected runtime behaviors, exactly according to the intended requirements, and the thing could still be a flop (e.g. customers don’t buy it). I view all of those concerns as related, but separate issues. There’s the issue (at play when I wrote this post) of “does the software behave the way I, the developer, intended it to behave?” There’s the issue of “does the software do what those paying the bills want it to do?” And there’s the issue of “is the software… Read more »
JohnAdams_1796
Guest

“…I can’t really say what the author thinks I ought to do!”

Erik certainly lists things you should do:

(1) continuous integration

(2) make deployments to test and production environments trivial

(3) an automated regression and problem detection suite

(4) smoke/performance tests or tests to see if resources are starved

(5) reduce or eliminate error prone practices and parts of the code

JohnAdams_1796
Guest
“The complete lack of any discussion on software design is a huge omission.” What items would you add to the above list to address design? Perhaps you would recommend a design review for each design decision. However, as you point out, “…design decisions are being made whenever one decides to make changes to software.” I’m sure we both would agree that the majority of such decisions are rather minor or trivial (suggestion: reflect on the last ten code change submissions you have made, and identify the design change beneath each of the code changes.) So how does one verify the… Read more »
Jared Barneck
Guest

I find most defects are caught and kept gone by Unit Tests. But most test writers forget Parameter Value Coverage. So even though they have 100% code coverage, they aren’t really 100% tested.
http://www.rhyous.com/2012/05/08/unit-testing-with-parameter-value-coverage-pvc

Once you start taking Parameter Value Coverage into account in your unit tests, your methods will rarely fail. They will almost never behave in an unexpected way because you know exactly what to expect no matter the input.

Erik Dietrich
Guest

That looks like C#. Have you ever used Pex (or whatever it’s called now) that generates automated tests with “interesting” inputs? Seems like a similar concept. (And a great idea, BTW)

My2Cents
Guest

Sorry – I’ve read this three times really trying to find a way to agree. I can also refuse to get my teenage daughter an HPV vaccination assuming she and her future husband are both ‘pure’. As long as humans are a part of any process, you can expect… flaws. It’s who we are. Also how cancer happens, BTW, so it’s really deeper than human, isn’t it. Thank you for the thought exercise, though, Erik.

JohnAdams_1796
Guest
Don’t be so hard on Erik. If you reduce the article to its essence, here’s what he’s saying: (1) Adopt the attitude that your goal is to have zero defects. (2) Overworking or threatening developers won’t get you there, however. (3) Superficial measures won’t get you there, either. (4) Adopt some measures that are known to reduce defects: (a) continuous integration (b) make deployments to test and production environments trivial (c) an automated regression and problem detection suite (d) smoke/performance tests or tests to see if resources are starved (e) reduce or eliminate error prone practices and parts of the… Read more »
Erik Dietrich
Guest

Well summarized! 🙂

Erik Dietrich
Guest

I don’t really understand the parallel to vaccination. I’m not saying that if you can’t have zero defects, then don’t bother writing software. I’m just saying that it’s possible to write software without defects. Sure it’s hard, so do the best you can.

Andy Bailey
Guest

HPV infection doesn’t just happen through intercourse, there are other sources too and most of them do not involve 2 people.
Why is that relevant? Well the parallel to Eric’s line of thought is that programming practices that accept defects as inevitable here is thinking that HPV infection is not under certain circumstances. Neither are right, nor safe.

cjacja
Guest
The most common defect is that the software works fine but does the wrong thing. You can’t test this because the test will pass and not show as a defect. About the only way to find these is to build a firewall between the QA people and the developers. Actually this is what beta testing tries to do for free. Let users see it your code solves their problems. OK so far but I develop embedded software. It runs inside a controller inside a car or a camera or whatever. Why we have to do is turn over the car… Read more »
Erik Dietrich
Guest

I definitely agree with “all of the above.” It’s certainly not a magic bullet situation.

For “does the wrong thing” do you automate acceptance/behavior tests? I know some people trying to bring this sort of thing to the embedded space, actually, I think using simulation and other indirect kinds of techniques.

Tim Gray
Guest

“have an automated regression and problem detection suite”

Sure, just “have” one. You must be from the Steve Martin school of software engineering.

‘How to make a million dollars: First, get a million dollars…’ Steve Martin

Erik Dietrich
Guest

That’s clever, I suppose, but it seems like it’d be more appropriate if the post had been titled, “how to build an automated test suite.”

Tim Gray
Guest
I am not saying the “how” should be part of this article. I am saying that the complete lack of consideration for the cost and effort involved is short-sighted. How often does the cost of a production defect outweigh the costs of what it would have taken to eliminate it pre-production? Are there any case studies of a large software company using this philosophy to release defect-free software? What do you do when third party libraries are the source of your defect? Do you rewrite that library in-house from scratch? How often have companies for which you have consulted successfully… Read more »
Erik Dietrich
Guest
Cost of what? The automated test suite? If you’ve been doing that from the get-go, the cost is just part of the cost of normal software operations. In general, I’m advocating that shops develop these competencies, but clearly there will be more bang for their buck implementing them on new efforts than retrofitting them on test-resistant codebases. For teams with a norm of writing legacy code in real time, switching to more rigorous practice obviously comes with cost. But if management cares about quality, it has to pay the piper sooner or later. (I’m just usually called in when management… Read more »
Tim Gray
Guest

It isn’t the number of lines of code you change that is the problem. It is the number of tests you have to manage and the time it takes to develop full regression UI tests. Even for something like this comment section a truly full test suite would include tests for entering control characters, unicode characters, pasted text from a Microsoft product with “smart quotes”, html tags and probably at least a dozen more.

Dilton_Dalton
Guest
While the number of defects (odd behavior) found during different test phases can be useful, in general it is one of the least useful software metrics available to management. It is right up there with the amount of sky above an airplane. The article threw poop on the idea of unit testing although the argument against it is very weak. If you build a system out of perfect parts, you may not get a perfect system but at least there is some hope. If you build your system out of defective parts, you will never have any hope of ending… Read more »
Andy Bailey
Guest

I fail to see where poop was thrown at Unit Testing. Eric quite clearly state a “testing suite” is necessary, these would inevitably include Unit Tests as well as other types of testing.

Dilton_Dalton
Guest

The article contains this rather negative bit of slung poop. “The first thing that everyone seems to seize on is mandating unit test
coverage, since this, presumably, forces the developers to write
automated tests, which, in turn, catch issues. The trouble here is that
high coverage doesn’t actually mean that the tests are effective, nor
does it cover all possible defect scenarios. Hiring or logging
additional QA hours will be of limited efficacy for similar reasons.”

wpDiscuz