Relax, Everyone’s Code Rots
Editorial note: I originally wrote this post for the NDepend blog. Go on over and check out the original. If you’re interested in topics like software department strategy, static analysis, and code metrics, it’ll be up your alley.
I earn my living, or part of it, anyway, doing something very awkward. I get called in to assess and analyze codebases for health and maintainability. As you can no doubt imagine, this tends not to make me particularly popular with the folks who have constructed and who maintain this code. “Who is this guy, and what does he know, anyway?” is a question that they ask, particularly when confronted with the parts of the assessment that paint the code in a less than flattering light. And, frankly, they’re right to ask it.
But in reality, it’s not so much about who I am and what I know as it is about properties of code bases. Are code elements, like types and methods, larger and more complex than those of the average code base? Is there a high degree of coupling and a low degree of cohesion? Are the portions of the code with the highest fan-in exercised by automated tests or are they extremely risky to change? Are there volatile parts of the code base that are touched with every commit? And, for all of these considerations and more, are they trending better or worse?
It’s this last question that is, perhaps, most important. And it helps me answer the question, “who are you and what do you know?” I’m a guy who has run these types of analyses on a lot of codebases and I can see how yours stacks up and where it’s going. And where it’s going isn’t awesome — it’s rotting.
But I’ll come back to that point later.
Communication complexity grows non-linearly.
Imagine that you’re working alone on a project of some sort. You’re certainly going to be bounded by your own productivity, but the communication overhead to whatever you’re doing is essentially nil (unless you’re counting leaving yourself notes and reminders, which I won’t). Now, let’s say it becomes necessary to substantially improve the throughput on this project, so an additional person is added to the mix. Communication is now more of a consideration, but it’s also quite simple. There’s one channel for it and that’s it.
But what happens as the team grows? Once you add a third person, the number of lines of communication goes from 1 to 3: AB, BC, AC. If you add a fourth person, you get another non-linear increase in the number of lines of communication: AB, AC, AD, BC, BD, CD, for a total of 6. If you go to 5, 6, and 7 people, the lines of communication increase to 10, 15, and 21, respectively. Mathematically, this growth makes sense. Each new person coming in adds one line of communication for each already existing person, which is why the lines of communication grow by 2, 3, 4, 5, etc. If you prefer a more mathematically rigorous way to understand this, it’s the idea in discrete mathematics known as combinations.
As the team grows, one person at a time, the amount of communication overhead beings to explode. By the time you have 20 people on the team, there are 190 one on one interactions (to say nothing of situations that call for multiple people). This means that, from a practical perspective, there is a limit on team size beyond which there are diminishing and, eventually, negative returns. The team will eventually do nothing but manage all of these communication channels.
What does this have to do with code? Well, a code base grows in about the same way. It’s just easier for people, particularly non-technical folks, like managers, to wrap their heads around team lines of communication.
Code breaks down the way disorganized collaboration breaks down.
In modern languages, code in a codebase is assembled into some form of logical units or modules. These might be functions, classes, whatever. When there are few of them, life is pretty good and the code is easy to reason about. As the number of these things grows, so too does the complexity, and not linearly. Without any kind of deliberate intervention, codebases suffer the same fate as teams with 20 or 30 or 40 human beings on them all trying to collaborate. Eventually they reach a point where adding to them introduces more problems than it fixes.
How do you prevent this? Well, it’s not easy, and it requires intentionality. This is where I’ll return to the theme of your code rotting. Yes, your code is rotting, but so is almost everyone else’s as well. It’s not an unusual circumstance, and it doesn’t mean that you’ve done anything horribly wrong. It just means that you haven’t yet figured out how to prevent it from rotting.
So, what does it take, in the end? Well, it’s simple… to describe. Put on your managerial hat and ask yourself what would do with a team of 20 or 30 people that was slowed to a crawl by communication overhead. I bet you’d break them into sub-teams with much less communication overhead and have limited, strategic communication between those teams. Maybe this would be reminiscent of how companies organize themselves?
To do this with a codebase requires the same approach, in concept. You minimize the size and complexity of the code components, the way you would with teams. You eliminate unnecessary dependencies in favor of cohesive units. You make sure you have solid backup plans around any high-risk communication bottlenecks and you try to eliminate those whenever possible. And you evaluate the whole thing on a consistent basis to ensure that you’re getting better (or at least not getting worse).
So in the end, there are two lessons to take away when it comes to your code base. The first is that having a codebase that is rotting with tech debt, while problematic, is not unusual, nor is it a personal failing of yours or your teams. The second is that you need to understand how to manage complexity within your code. The first part is easy. The second part is why code assessments, analysis tools, and coursework on clean code exists in the first place. Because writing clean code takes a lot of work.