Proposal: A Law of Performance Citation
I anticipate this post being fairly controversial, though that’s not my intention. I imagine that if it wanders its way onto r/programming it will receive a lot of votes and zero points as supporters and detractors engage in a furious, evenly-matched arm-wrestling standoff with upvotes and downvotes. Or maybe three people will read this and none of them will care. It turns out that I’m actually terrible at predicting which posts will be popular and/or high-traffic. And I’ll try to avoid couching this as flame-bait because I think I actually have a fairly non-controversial point on a potentially controversial subject.
To get right down to it, the Law of Performance Citation that I propose is this:
If you’re going to cite performance considerations as the reason your code looks the way it does, you need to justify it by describing how a stakeholder will be affected.
By way of an example, consider a situation I encountered some years back. I was pitching in to help out with a bit of programming for someone when I was light on work, and the task I was given amounted to “copy-paste-and-adjust-to-taste.” This was the first red flag, but hey, not my project or decision, so I took the “template code” I was given and made the best of it. The author gave me code containing, among other things, a method that looked more or less like this (obfuscated and simplified for example purposes):
public void SomeMethod()
bool isSomethingAboutAFooTrue = false;
bool isSomethingElseAboutAFooTrue = false;
IEnumerable<Foo> foos = ReadFoosFromAFile();
for (int i = 0; i < foos.Count(); i++)
var foo = foos.ElementAt(i);
isSomethingAboutAFooTrue = true;
isSomethingElseAboutAFooTrue = true;
if (isSomethingAboutAFooTrue && isSomethingElseAboutAFooTrue)
I promptly changed it to one that looked like this for my version of the implementation:
public void SomeMethodRefactored()
var foos = ReadFoosFromAFile();
bool isSomethingAboutOneOfTheFoosTrue = foos.Any(foo => IsSomethingAboutAFooTrue(foo));
bool isSomethingElseABoutOneOfTheFoosTrue = foos.Any(foo => IsSomethingElseAboutAFooTrue(foo));
I checked this in as my new code (I wasn’t changing his existing code) and thought, “he’ll probably see this and retrofit it to his old stuff once he sees how cool the functional/Linq approach is.” I had flattened a bunch of clunky looping logic into a compact, highly-readable method, and I found this code to be much easier to reason about and understand. But I turned out to be wrong about his reaction.
When I checked on the code the next day, I saw that my version had been replaced by a version that mirrored the original one and didn’t take advantage of even the keyword foreach, to say nothing of Linq. Bemused, I asked my colleague what had prompted this change and he told me that it was important not to process the foos in the collection a second time if it wasn’t necessary and that my code was inefficient. He also told me, for good measure, that I shouldn’t use var because “strong typing is better.”
I stifled a chuckle and ignored the var comment and went back to look at the code in more detail, fearful that I’d missed something. But no, not really. The method about reading from a file read in the entire foo collection from the file (this method was in another assembly and not mine to modify anyway), and the average number of foos was single digits. The foos were pretty lightweight objects once read in, and the methods evaluating them were minimal and straightforward.
Was this guy seriously suggesting that possibly walking an extra eight or nine foos in memory, worst case, sandwiched between a file read over the network and a database write over the network was a problem? Was he suggesting that it was worth a bunch of extra lines of confusing flag-based code? The answer, apparently, was “yes” and “yes.”
But actually, I don’t think there was an answer to either of those questions in reality because I strongly suspect that these questions never crossed his mind. I suspect that what happened instead was that he looked at the code, didn’t like that I had changed it, and looked quickly and superficially for a reason to revert it. I don’t think that during this ‘performance analysis’ any thought was given to how external I/O over a network was many orders of magnitude more expensive than the savings, much less any thought of a time trial or O-notation analysis of the code. It seemed more like hand-waving.
It’s an easy thing to do. I’ve seen it time and again throughout my career and in discussing code with others. People make vague, passing references to “performance considerations” and use these as justifications for code-related decisions. Performance and resource consumption are considerations that are very hard to reason about before run-time. If they weren’t, there wouldn’t be college-level discrete math courses dedicated to algorithm runtime analysis. And because it’s hard to reason about, it becomes so nuanced and subjective in these sorts of discussions that right and wrong are matters of opinion and it’s all really relative. Arguing about runtime performance is like arguing about which stocks are going to be profitable, who is going to win the next Super Bowl, or whether this is going to be a hot summer. Everyone is an expert and everyone has an opinion, but those opinions amount to guesses until actual events play out for observation.
Don’t get me wrong — I’m not saying that it isn’t possible to know by compile-time inspection whether a loop will terminate early or not, depending on the input. What I’m talking about is how code will run in complex environments with countless unpredictable factors and whether any of these considerations have an adverse impact on system stakeholders. For instance, in the example here, the (more compact, maintainable) code that I wrote appears that it will perform ever-so-slightly worse than the code it replaced. But no user will notice losing a few hundred nano-seconds between operations that each take seconds. And what’s going on under the hood? What optimizations and magic does the compiler perform on each of the pieces of code we write? What does the .NET framework do in terms of caching or optimization at runtime? How about the database or the file read/write API?
Can you honestly say that you know without a lot of research or without running the code and doing actual time trials? If you do, your knowledge is far more encyclopedic than mine and that of the overwhelming majority of programmers. But even if you say you do, I’d like to see some time trials just the same. No offense. And even time trials aren’t really sufficient because they might only demonstrate that your version of the code shaves a few microseconds off of a non-critical process running headlessly once a week somewhere. It’s for this reason that I feel like this ‘law’ that I’m proposing should be a thing.
First off, I’m not saying that one shouldn’t bear efficiency in mind when coding or that one should deliberately write slow or inefficient code. What I’m really getting at here is that we should be writing clear, maintainable, communicative and, above all, correct code as a top priority. When those traits are established, we can worry about how the code runs — and only then if we can demonstrate that a user’s or stakeholder’s experience would be improved by worrying about it.
Secondly, I’m aware of the aphorism that “premature optimization is the root of all evil.” This is a little broader and less strident about avoiding optimization. (I’m not actually sure that I agree about premature optimization, and I’d probably opt for knowledge duplication in a system as the root of all evil, if I were picking one.) I’m talking about how one justifies code more than how one goes about writing it. I think it’s time for us to call people out (politely) when they wave off criticism about some gigantic, dense, flag-ridden method with assurances that it “performs better in production.” Prove it, and show me who benefits from it. Talk is cheap, and I can easily show you who loses when you write code like that (hint: any maintenance programmer, including you).
Finally, if you are citing performance reasons and you’re right, then please just take the time to explain the issue to those to whom you’re talking. This might include someone writing clean-looking but inefficient code or someone writing ugly, inefficient code. You can make a stakeholder-interest case, so please spend a few minutes doing it. People will learn something from you. And here’s a bit of subtlety: that case can include saying something like, “it won’t actually affect the users in this particular method, but this inefficient approach seems to be a pattern of yours and it may well affect stakeholders the next time you do it.” In my mind, correcting/pointing out an ipso facto inefficient programming practice of a colleague, like hand-writing bubble sorts everywhere, definitely has a business case.