The Myth of Quick Copy-And-Paste Programming

Quick and Dirty?

“We’re really busy right now–just copy that customer form and do a few find and replaces. I know it’s not a great approach, but we don’t have time to build the new form from scratch.”

Ever heard that one? Ever said it? Once, twice, dozens, or hundreds of times? This has the conversational ring of, “I know I shouldn’t eat that brownie, but it’s okay. I’m planning to start exercising one of these days.” The subtext of the message is a willingness to settle–to trade self esteem for immediate gratification with the disclaimer “the outcome just isn’t important enough to me to do the hard thing.” If that sounds harsh to you when put in the context of programming instead of diet and exercise, there’s a reason: you’re used to fooling yourself into believing that copy-and-paste programming sacrifices good design to save time. But the reality is that the “time savings” is a voluntary self-delusion and that it’s really trading good design for being able to “program” mindlessly for a few minutes or hours. Considered in this light, it’s actually pretty similar to gorging yourself on chocolate while thinking that you’ll exercise later when you can afford that gym membership and whatnot: “I’ll space out and fantasize about my upcoming vacation now while I’m programming and clean up the mess when I come back, refreshed.”

I know what you’re thinking. You’re probably indignant right now because you’re thinking of a time that you copied a 50,000 line file and you changed only two things in it. Re-typing all 50,000 lines would have taken days, and you got the work done in minutes. The same number of minutes, in fact, that you’d have spent parameterizing those two differences and updating clients to call the new method, thus achieving good design and efficiency. Okay, so bad example.

Now, you’re thinking about the time that you copied that 50,000 line file and there were about 300 differences–no way you could easily have parameterized all of that. Only a copy, paste, and a bunch of find and replace could do the trick there. After that, you were up and running in about an hour. And everything worked. Oh, except for that one place where the text was a little different. You missed that one, but you found it after ten minutes of debugging. Oh, and except for five more of those at ten or fifteen minutes a pop. Oh, and then there was that twenty minutes you spent after the architect pointed out that a bunch of methods needed to be renamed because they made no sense named what they were in the new class. Then you were truly done. Except, oh, crap, runtime binding was failing with that other module since you changed those method names to please the stupid architect. That was a doozy because no one noticed it until QA a week later, and then you spent a whole day debugging it and another day fixing it. Oh, and then there was a crazy deadlock issue writing to the log file that some beta customer found three months later. As it turns out, you completely forgot that if the new and old code file methods executed in just the right interleaving, wackiness might ensue. Ugh, that took a week to reproduce and then another two weeks to figure out. Okay, okay, so maybe that was a bad example of the copy-and-paste time savings.

But you’re still mad at me. Maybe those weren’t the best examples, but all the other times you do it are good examples. You’re getting things done and cranking out code. You’re doing things that get you 80% of the way there and making it so that you only have to do 20% of the work, rather than doing all 100% from scratch. Every time you copy and paste, you save 80% of the manpower (minus, of course, the time spent changing the parts of the 80% that turned out not to be part of the 80% after all). The important point is that as long as you discount all of the things you miss while copying and pasting and all of the defects you introduce while doing it and all of the crushing technical debt you accrue while doing it and all of the downstream time fixing errors in several places, you’re saving a lot of time. I mean, it’s the same as how that brownie you were eating is actually pretty low in calories if you don’t count the flour, sugar, chocolate, butter, nuts, and oil. Come to think of it, you’re practically losing weight by eating it.

We Love What We Have

Hmm…apparently, it’s easy to view an activity as a net positive when you make it a point to ignore any negatives. And it’s also understandable. My flippant tone here is more for effect than it is meant to be a scathing indictment of people for cutting corners. There’s a bit of human psychology known as the Endowment Effect that explains a lot of this tendency. We have a cognitive bias to favor what we already have over what’s new, hypothetical, or in the possession of others.

Humans are loss averse (we feel the pain of a loss of an item more than we experience pleasure at acquiring the item, on average), and this leads to a situation in which we place higher economic value on things that we have than things that we don’t have. You may buy a tchotchke on vacation from a vendor for $10 and wouldn’t have paid a dollar more, yet when someone comes along and offers you $12 or $20 or even $30 for it, you suddenly get possessive and don’t want to part with it. This is the Endowment Effect in action.

What does this have to do with copying and pasting code? Well, if you substitute time/effort as your currency, there will be an innate cognitive bias toward adapting work that you’ve already done as opposed to creating some theoretical new implementation. In other words, you’re going to say, “This thing I already have is a source of more efficiency than whatever else someone might do.” This means that, assuming each would take equal time and that time is the primary motivator, you’re going to favor doing something with what you already have over implementing something new from scratch (in spite of how much we all love greenfield coding). Unfortunately, it’s both easy and common to conflate the Endowment Effect’s cognitive bias toward reuse with the sloppy practice of copy and paste.

At the point of having decided to adapt existing code to the new situation, things can break one of two ways. You can achieve this reuse by abstracting and factoring common logic, or you can achieve it by copy and paste. Once you’ve already decided on reuse versus blaze a new trail, the efficiency-as-currency version of the Endowment Effect has already played out in your mind–you’ve already, for better or for worse, opted to re-appropriate existing work. Now you’re just deciding between doing the favorable-but-hard thing or the sloppy-and-easy thing. This is why I said it’s more like opting to stuff your face with brownies for promises of later exercise than it is to save time at the expense of good design.

Think about it. Copy and paste is satisfying the way those empty calories are satisfying. Doing busy-work looks a lot like doing mentally taxing work and is often valued similarly in suboptimally incentivized environments. So, to copy and paste or not to copy and paste is often a matter of, “Do I want to listen to talk radio and space out while getting work done, or do I want to concentrate and bang my head against hard problems?” And if you do a real, honest self-evaluation, I’m pretty sure you’ll come to the same conclusion. Copying and pasting is the reality television of programming–completely devoid of meaningful substance in favor of predictable, mindless repetition.

In the end, you make two decisions when you go the copy-and-paste route. You decide to adapt what you’ve got rather than invent something new, and that’s where the substance of the time/planning/efficiency decision takes place. And once you’ve made the (perfectly fine) decision to use what you’ve got, the next decision is whether to work (factor into a common location) or tune out and malinger (copy and paste). In the end, they’re going to be a wash for time. The up front time saved by not thinking through the design is going to be washed out by the time wasted on the defects that coding without thinking introduces, and you’re going to be left with extra technical debt even after the wash for a net time negative. So, in the context of “quick and dirty,” the decision of whether new or reused is more efficient (“quick”) is in reality separated from and orthogonal to the decision of factor versus copy and paste (“dirty”). Next time someone tells you they’re going with the “quick and dirty” solution of copy and paste, tell them, “You’re not opting for quick and dirty–you’re just quickly opting for dirty.”

  • http://www.nwcadence.com Steven Borg

    Good post. However, I’m going to disagree a little bit (although much of it from a devil’s advocate position). I really like the definition of done I’ve seen at one of our clients. Whenever they were presented with a new form or piece of code that was very close to another that was already done, they would follow this set of rules:

    1) If the request was the first that they received, they would do a copy/paste, inheritance, or refactoring – something down and dirty. The theory (proved out by stats) being that getting a second request only rarely would end up with a third request, and the business value from building a code generator or the like wasn’t worth it.

    2) If the request was the second request (meaning they had been asked three times to build the same thing) they would completely refactor the code, write a code generator or take another path to generalize the creation of that code. The theory (also proven out by stats) being that getting three requests for the same thing normally resulted in several future requests of the same type.

    Basically, they had a distribution of “similar code” feature requests. Majority of features requried unique code, less common was a feature that required similar code to ONE other feature, and after that the distribution showed that once a feature request was “code similar” to two other requests that feature would very, very likely be requested time and again in the future.

    Thus, their definition of done allowed them to be extremely responsive to the business in nearly all cases. And they didn’t ‘waste’ time and effort on requests that only resulted in a small amount of duplication.

    Did they still suffer from code duplication? Yes, however, the demand caused by that small amount of ‘technical debt’ was dwarfed by the productivity they achieved by having a well documented definition of done, coupled with a strong understanding of their domain and the type of requests received by their product owners.

  • http://blog.hinshelwood.com/ Martin Hinshelwood

    @StevenBorg:disqus while I agree at the distributable unit level I think that Erik is spot on for coding. At the method, class and assembly level there is far too much technical debt accrued, as well as hidden penalties to justify copy-past. I also know the team you are talking about and they are copying whole micro-sites, a single unit deliverable, and changing it slightly to fit the need. At this level the DoD above is acceptable as you are not copy-pasting code in the same assemblies / solutions….

  • http://twitter.com/Code_Analysis Andrey Karpov
  • http://www.daedtech.com/blog Erik Dietrich

    I have to say that I respect the ongoing analysis of the distribution of feature requests. It seems like any shop doing that is going to have a level of self-awareness of their process that keeps technical debt at manageable levels. So if they find that following the “do it once, duplicate it once and hold your nose, abstract it the third time” methodology, I have no qualm with that. And, in general, if the “something down and dirty” is an inheritance structure or a refactor, I have no qualms with that either.

    My main objection in this post is to the notion that copy and paste is an automatic time savings within a project (and I mean within a project and not, as Martin points out, in basically “forking” a code base which will then be separately maintained). I think everyone agrees that this practice incurs technical debt and very avoidable technical debt at that, but I think that it’s not even a net time savings in the short term in most cases.

    I definitely like the actual tracking of requests that you mention, though. I’ll have to keep that in mind for future reference.

  • http://www.daedtech.com/blog Erik Dietrich

    Those are exactly the kinds of examples of expressions that you copy and paste and then forget to change an index by 1 or the name of a local variable or something and wind up burning more time debugging than you would have typing out the new code.

  • Pingback: Throw Out Your Code | DaedTech