Proposal: A Law of Performance Citation

I anticipate this post being fairly controversial, though that’s not my intention. I imagine that if it wanders its way onto r/programming it will receive a lot of votes and zero points as supporters and detractors engage in a furious, evenly-matched arm-wrestling standoff with upvotes and downvotes. Or maybe three people will read this and none of them will care. It turns out that I’m actually terrible at predicting which posts will be popular and/or high-traffic. And I’ll try to avoid couching this as flame-bait because I think I actually have a fairly non-controversial point on a potentially controversial subject.

ArmWrestling

To get right down to it, the Law of Performance Citation that I propose is this:

If you’re going to cite performance considerations as the reason your code looks the way it does, you need to justify it by describing how a stakeholder will be affected.

By way of an example, consider a situation I encountered some years back. I was pitching in to help out with a bit of programming for someone when I was light on work, and the task I was given amounted to “copy-paste-and-adjust-to-taste.” This was the first red flag, but hey, not my project or decision, so I took the “template code” I was given and made the best of it. The author gave me code containing, among other things, a method that looked more or less like this (obfuscated and simplified for example purposes):

public void SomeMethod()
{
    bool isSomethingAboutAFooTrue = false;
    bool isSomethingElseAboutAFooTrue = false;
    IEnumerable<Foo> foos = ReadFoosFromAFile();
    for (int i = 0; i < foos.Count(); i++)
    {
        var foo = foos.ElementAt(i);
        if (IsSomethingAboutAFooTrue(foo))
        {
            isSomethingAboutAFooTrue = true;
        }
        if (IsSomethingElseAboutAFooTrue(foo))
        {
            isSomethingElseAboutAFooTrue = true;
        }
        if (isSomethingAboutAFooTrue && isSomethingElseAboutAFooTrue)
        {
            break;
        }
    }

    WriteToADatabase(isSomethingAboutAFooTrue, isSomethingElseAboutAFooTrue);
}

I promptly changed it to one that looked like this for my version of the implementation:

public void SomeMethodRefactored()
{
    var foos = ReadFoosFromAFile();

    bool isSomethingAboutOneOfTheFoosTrue = foos.Any(foo => IsSomethingAboutAFooTrue(foo));
    bool isSomethingElseABoutOneOfTheFoosTrue = foos.Any(foo => IsSomethingElseAboutAFooTrue(foo));

    WriteToADatabase(isSomethingAboutOneOfTheFoosTrue, isSomethingElseABoutOneOfTheFoosTrue);
}

I checked this in as my new code (I wasn’t changing his existing code) and thought, “he’ll probably see this and retrofit it to his old stuff once he sees how cool the functional/Linq approach is.” I had flattened a bunch of clunky looping logic into a compact, highly-readable method, and I found this code to be much easier to reason about and understand. But I turned out to be wrong about his reaction.

When I checked on the code the next day, I saw that my version had been replaced by a version that mirrored the original one and didn’t take advantage of even the keyword foreach, to say nothing of Linq. Bemused, I asked my colleague what had prompted this change and he told me that it was important not to process the foos in the collection a second time if it wasn’t necessary and that my code was inefficient. He also told me, for good measure, that I shouldn’t use var because “strong typing is better.”

I stifled a chuckle and ignored the var comment and went back to look at the code in more detail, fearful that I’d missed something. But no, not really. The method about reading from a file read in the entire foo collection from the file (this method was in another assembly and not mine to modify anyway), and the average number of foos was single digits. The foos were pretty lightweight objects once read in, and the methods evaluating them were minimal and straightforward.

Was this guy seriously suggesting that possibly walking an extra eight or nine foos in memory, worst case, sandwiched between a file read over the network and a database write over the network was a problem? Was he suggesting that it was worth a bunch of extra lines of confusing flag-based code? The answer, apparently, was “yes” and “yes.”

But actually, I don’t think there was an answer to either of those questions in reality because I strongly suspect that these questions never crossed his mind. I suspect that what happened instead was that he looked at the code, didn’t like that I had changed it, and looked quickly and superficially for a reason to revert it. I don’t think that during this ‘performance analysis’ any thought was given to how external I/O over a network was many orders of magnitude more expensive than the savings, much less any thought of a time trial or O-notation analysis of the code. It seemed more like hand-waving.

It’s an easy thing to do. I’ve seen it time and again throughout my career and in discussing code with others. People make vague, passing references to “performance considerations” and use these as justifications for code-related decisions. Performance and resource consumption are considerations that are very hard to reason about before run-time. If they weren’t, there wouldn’t be college-level discrete math courses dedicated to algorithm runtime analysis. And because it’s hard to reason about, it becomes so nuanced and subjective in these sorts of discussions that right and wrong are matters of opinion and it’s all really relative. Arguing about runtime performance is like arguing about which stocks are going to be profitable, who is going to win the next Super Bowl, or whether this is going to be a hot summer. Everyone is an expert and everyone has an opinion, but those opinions amount to guesses until actual events play out for observation.

Don’t get me wrong — I’m not saying that it isn’t possible to know by compile-time inspection whether a loop will terminate early or not, depending on the input. What I’m talking about is how code will run in complex environments with countless unpredictable factors and whether any of these considerations have an adverse impact on system stakeholders. For instance, in the example here, the (more compact, maintainable) code that I wrote appears that it will perform ever-so-slightly worse than the code it replaced. But no user will notice losing a few hundred nano-seconds between operations that each take seconds. And what’s going on under the hood? What optimizations and magic does the compiler perform on each of the pieces of code we write? What does the .NET framework do in terms of caching or optimization at runtime? How about the database or the file read/write API?

Can you honestly say that you know without a lot of research or without running the code and doing actual time trials? If you do, your knowledge is far more encyclopedic than mine and that of the overwhelming majority of programmers. But even if you say you do, I’d like to see some time trials just the same. No offense. And even time trials aren’t really sufficient because they might only demonstrate that your version of the code shaves a few microseconds off of a non-critical process running headlessly once a week somewhere. It’s for this reason that I feel like this ‘law’ that I’m proposing should be a thing.

Caveats

First off, I’m not saying that one shouldn’t bear efficiency in mind when coding or that one should deliberately write slow or inefficient code. What I’m really getting at here is that we should be writing clear, maintainable, communicative and, above all, correct code as a top priority. When those traits are established, we can worry about how the code runs — and only then if we can demonstrate that a user’s or stakeholder’s experience would be improved by worrying about it.

Secondly, I’m aware of the aphorism that “premature optimization is the root of all evil.” This is a little broader and less strident about avoiding optimization. (I’m not actually sure that I agree about premature optimization, and I’d probably opt for knowledge duplication in a system as the root of all evil, if I were picking one.) I’m talking about how one justifies code more than how one goes about writing it. I think it’s time for us to call people out (politely) when they wave off criticism about some gigantic, dense, flag-ridden method with assurances that it “performs better in production.” Prove it, and show me who benefits from it. Talk is cheap, and I can easily show you who loses when you write code like that (hint: any maintenance programmer, including you).

Finally, if you are citing performance reasons and you’re right, then please just take the time to explain the issue to those to whom you’re talking. This might include someone writing clean-looking but inefficient code or someone writing ugly, inefficient code. You can make a stakeholder-interest case, so please spend a few minutes doing it. People will learn something from you. And here’s a bit of subtlety: that case can include saying something like, “it won’t actually affect the users in this particular method, but this inefficient approach seems to be a pattern of yours and it may well affect stakeholders the next time you do it.” In my mind, correcting/pointing out an ipso facto inefficient programming practice of a colleague, like hand-writing bubble sorts everywhere, definitely has a business case.

  • soru

    ‘ Performance and resource consumption are considerations that are very hard to reason about before run-time.’

    This is true, but would seem to lead to the opposite conclusion than the one you draw. Perforrmance-related bugs, where something prematurely hits an unexpected resource limit that negatively impacts its behavior, are the hardest of all bugs to predict, analyse and repair. They can make memory corruption or buffer verflows look like child’s play.

    So in code where reliability is a consideration, where there are two style choices, and one has a known lower resource impact, that one is correct and should be used universally. Precisely _because_ you can’t possibly predict where it is necessary, where it is trivial, and where it is irrelevant. Any more than you can say a particular memory corruption will definitely have a specific effect on a given customer.

    Anything that is a contestable style choice can be gotten used to; the human brain can optimise for readability better than any current compiler can for resource usage.

  • Timothy Boyce

    I agree with you that clear and concise code is more important than ultimate performance. But in my opinion, both of the code examples have problems. You should not enumerate an IEnumerable multiple times because it could have unintended side effects. That said, his code enumerates even more times than yours, due to the use of Count() and ElementAt(). I would actually enumerate up to n+1 times, if the type behind the IEnumerable is not an IList.

  • Timothy Boyce

    I tested both methods with a collection of size Int16.MaxValue. When the IEnumerable is not an IList, his method takes about 30,000ms, compared to 1ms for your method.

  • http://www.daedtech.com/blog Erik Dietrich

    That’s a good point, and my mistake in coding up the example. (In my actual sample code, the method does nothing but throw a not implemented exception anyway :) ) When I was actually living this, the method getting things from a file returned an array, and I think the ‘template’ code declared it as such. I doubt someone not familiar with Linq would use IEnumerable out of the blue.

    I’d fix this, but then it would make this whole line of conversation look weird, so I’ll leave my hasty example in place, warts and all.

  • Timothy Boyce

    If the source collection is an array, then I agree that his optimization is not necessary. Using Any() should perform just as well in the real world.

    An old post that I always point people towards when they are doing things like this: http://www.codinghorror.com/blog/2009/01/the-sad-tragedy-of-micro-optimization-theater.html

  • http://genehughson.wordpress.com/ Gene Hughson

    Indeed…People tend to use “Performance” and “Security” as the magic words to shoot down ideas they don’t like. “Why” seems to be the light that sends the cockroaches running when they can’t justify their bias.

  • trellis

    There isn’t much in the way of technical advantages to either approach in this case. It was probably just a clash of tastes. He’s more comfortable with his style of coding, you’re more comfortable with yours. I suspect his stated concerns were just a way to wave you off without getting into it. We all get a little control-freaky now and then.

  • http://www.daedtech.com/blog Erik Dietrich

    I don’t think we have any disagreement. You’re talking about a situation where you can make a case on behalf of a stakeholder (someone demanding reliability or uptime of some sort).

    Regarding the last paragraph, sure, you can get used to things. At times in my career I’ve gotten used to looking at 10 thousand line, procedural classes riddled with bugs, but that doesn’t mean that writing code this way versus another way is all a matter of taste. There have been experiments to empirically demonstrate that certain types and styles of coding reduce maintenance cost.

  • http://www.daedtech.com/blog Erik Dietrich

    Apologies if two responses show up — I typed out one and disqus ate it when I tried to post, so I’m retyping.

    I agree that there are different styles at play and that the motivation was to make justification sound more impressive than “I prefer my style of doing things.” But that said, I don’t think it’s a matter of six of one/half dozen of the other. The first code snippet has about 3 times as much code to do the same thing, which means 3 times as much chance for errors. The second snippet also reads much more like English, which means that most people will grok it much more quickly, reducing maintenance costs.

    And that’s kind of the thrust of what I’m going for in this post — I’m calling on people to provide justification rather than hand-waving.

  • http://www.daedtech.com/blog Erik Dietrich

    That’s exactly what I’ve encountered. If you respond to “I did it that way because it speeds up the code” with “really, by how much?” you often get a deer in headlights look. I mean I’m all for making things run more efficiently if it can be proven that the changes do so and doing so matters. But absent that, I always favor maintainability and flexibility. And I’ve also found that it’s much easier to make well-factored code faster than it is to make opaque, uber-optimized code readable (or to maintain it).

  • http://www.daedtech.com/blog Erik Dietrich

    That’s a nice article, and really I think goes to the same kind of point that I’m making, which is that squabbles over tiny performance considerations miss the point in a lot of cases. Interesting results on your time trials, by the way — I wouldn’t have guessed.

  • Jordan Whiteley

    The For…If anti-pattern is a terrible thing, and it seems to be most liberally abused by visual basic users. The funny part about this is that For…If is normally terrible for performance.

    The loop will run both conditional checks for every item in the loop until both items are found, and then it will break out. The Any() will break as soon as it finds one that resolves to true. I’d go so far as to say that there should be no scenario where the .Any() solution would execute slower than the for…if.

    Note: I have a problem with premature optimization and I know it. Discrete structures was one of my favourite classes.

  • Timothy Boyce

    If the check methods are expensive, you could short circuit the if’s to avoid calling them more than needed, but then the code is getting even more complicated and probably still not any faster than iterating multiple times and using Any().

  • Jordan Whiteley

    Yeah that was my point.
    Concrete example: There are 10 foos and “SomethingAboutAFoo” evaluates to true on the second iteration foo and “SomethingElseAboutAFoo” evaluates true on the 8th iteration

    The For..If executes both checks 8 times for a total of 16 method calls of unknown weight, and 8 break checks.

    You could add short circuting to the equation, but that would still yield 10 method calls of unknown weight, 16 short circuit checks, and 8 break checks.

    In the same scenario the Any() solution breaks on the first true condition so the short circuit check is the method, and you end up with 10 method calls, no break or short circuit checks. That is the optimal solution. Faster and easier to understand because it’s written in a way that avoids unnecessary operations.

  • trellis

    I don’t disagree. Just trying to put myself in his shoes. I tend to overstress diplomacy when dealing with other coders.

    If somebody refactors something under my jurisdiction (i.e. responsibility) with some new features I’m not familiar with, because I haven’t had time to pick up on new stuff or whatever, then I’m going to have to either (A) shoehorn some education into my schedule, (B) find a way to diplomatically instruct them to do it the way I’m comfortable with, or (C) basically do what this person seems to have done. Obviously A is the best answer and C is the worst answer. But C is also the easiest answer, and the most likely if they don’t know you well. It’s not easy for most programmers to just straightforwardly admit that they aren’t up to speed on something you’ve done and aren’t comfortable just going with it. Given his not understanding of var I’d say he probably felt a little shown up. I’d like to think if it had been me I would have just asked, but whatever.

    I’m suddenly reminded of a Star Trek TNG episode…

    (ahem) The way you have it only reads more naturally if you’re familiar with Any and lambdas, BTW. If you have never used them before then it’s a little harder to bring yourself to trust the code. (I wouldn’t want to sign off on code just based on what it looks like it probably does.) Nothing some documentation surfing wouldn’t cure, but that brings it back to option (A), the appeal of which depends on both deadlines and attitude.

    If you had replaced the Any & lambdas with “old style” custom functions and he told you it added too much to the call stack, well… best to just move on.