Stories about Software


Why I Don’t Inherit from Collection Types

In one of the recent posts in the TDD chess series, I was asked a question that made me sort of stop and muse a little after it was asked. It had to do with whether to define a custom collection type or simply use an out of the box one and my thinking went, “well, I’d use an out of the box one unless there were some compelling reason to build my own.” This got me to thinking about something that had gone the way of the Dodo in my coding, which was to build collection types by inheriting from existing collections or else implementing collection interfaces. I used to do that and now I don’t do that anymore. At some point, it simply stopped happening. But why?

I never read some blog post or book that convinced me this was stupid, nor do I have any recollection of getting burned by this practice. There was no great moment of falling out — just a long-ago cessation that wasn’t a conscious decision, sort of like the point in your 20’s when it stops being a given that you’re going to be out until all hours on a Saturday night. I pondered this here and there for a week or so, and then I figured it out.

TDD happened. Er, well, I started practicing TDD as a matter of course, and, in the time since I started doing that, I never inherited from a collection type. Interesting. So, what’s the relationship?

Well, simply put, the relationship is that inheriting from a collection type just never seems to be the simplest way to get some test passing, and it’s never a compelling refactoring (if I were defining some kind of API library for developers and a custom collection type were a reasonable end-product, I would write one, but not for my own development sake). Inheriting from a collection type is, in fact, a rather speculative piece of coding. You’re defining some type that you want to use and saying “you know, it’d probably be handy if I could also do X, Y, and Z to it later.” But I’ve written my opinion about this type of activity.

Doing this also largely increases the surface area of your class. If I have a type that encapsulates some collection of things and I want to give clients the ability to access these things by a property such as their “Id” then I define a GetById(int) method (or whatever). But if I then say “well, you know what, maybe they want to access these things by position or iterate over them or something, so I’ll just inherit from List of Whatever, and give them all that goodness. But yikes! Now I’m responsible for maintaining a concept of ordering, handling the case of out of bounds indexing, and all sorts of other unpleasantness. So, I can either punt and delegate to the methods on the type that I’m wrapping or else I can get rid of the encapsulation altogether and just cobble additional functionality onto List.

But that’s an icky choice. I’m either coughing up all sorts of extra surface area and delegating it to something I’m wrapping (ugly design) or I’m kind of weirdly violating the SRP by extending some collection type to do domain sorts of things. And I’m doing this counter to what my TDD approach would bubble to the surface anyway. I’d never write speculative code at all. Until some client of mine were using something besides GetById(int), GetById(int) is all it’s getting.

This has led me to wonder if, perhaps, there’s some relationship between the rise of TDD and decoupled design and the adage to favor composition over inheritance. Deep inheritance hierarchies across namespaces, assemblies, and even authors, creates the kind of indirection that makes intention muddy and testing (especially TDD) a chore. I want a class that encapsulates details and provides behavior because I’m hitting that public API with tests to define behavior. I don’t want some class that’s a List for the most part but with other stuff tacked on — I don’t want to unit test the List class from the library at all.

It’s interesting speculation, but at the end of the day, the reason I don’t inherit from collection types is a derivative of the fact that I think doing so is awkward and I’m leerier and leerier of inheritance as time goes on. I really don’t do it because I never find it helpful to do so, tautological as that may seem. Perhaps if you’re like me and you just stop thinking of it as a tool in your tool chest, you won’t miss it.


Interface Segregation Principle: A Practical Example

I’ve had this partially completed post in my drafts folder for a while, and, thanks to a sort of half-hearted New Years resolution to either finish or discard really old drafts, I’m going to finish this one. I changed the title and re-did the focus a bit, since this one was a year and a half old, and the code in question here is hazier and hazier in my mind. But it’s a tale of Singletons, needless coupling, poor cohesion, and woe.

I’ve been interviewing some candidates of late and one of the things I typically ask about is familiarity with the SOLID principles. An encouraging amount of people say things like, “oh sure, classes should have only one purpose” or “oh, yeah, we use IoC containers!” It seems as though S, O, and D get the most love, while L and I (Liskov Substitution Principle and Interface Segregation Principle, respectively) are rarely mentioned. Today, I’ll talk about I with an example that may hit home with you.

I worked on a project once with a fairly unique ‘architecture.’ It was a GUI-heavy desktop application with a reasonably straightforward set of domain objects that were read from persistence and stored in memory. This was accomplished using what really kind of amounted to an old school, C-style memory map of structs. There was a root object, and everything was hierarchically possessed by this root object. So, let’s say that you were modeling a parking lot full of cars — you’d navigate to the car parts via calls like Root.Cars[12].Engine.Battery or Root.Cars[5].Body.Exterior.Paint. And naturally, Root was implemented as a Singleton so that you could access this memory map from anywhere at any time to read or write. Yep. That was a long project.

This Singleton probably started out modestly enough, but by the time I had worked on the project, its sizable gravitational pull had roped smaller satellites, code-moons and free-falling bits of flotsam into it, creating a runaway juggernaut of anti-pattern. This thing was, if memory serves, pushing 10K lines of code and perhaps even spreading out into some partial classes. I think by the time I moved onto another project, it was showing the first signs of having an event horizon.


There were several different skins on this desktop app, and for some reason, each of them had their own mini-versions of this Singleton (a much more modest 1K to 2K LOC, each), I guess to contain the skin-specific sprawl that would have created a circular reference situation with the behemoth itself. It was one of these mini-versions that inspired this post, along with a story a friend of mine told me after I’d left the project.

A few of us had, while working in this code base, endeavored to extract at least tiny pockets of code from the massive gravitational field of the Singletons so that we could write some unit tests and avoid the bug whack-a-mole game that ensued whenever you changed something in the global state. I was probably the most successful in fighting this battle and, my friend, having less success after I’d left, complained to me one day over lunch. He said that he’d had a perfectly testable class, but in a code review someone in a position of relative authority had pointed out that something he was doing was already done in one of the satellite Singletons and he should use that code. His request to hide the Singleton behind a wrapper or even an interface implementation was denied. A large portion of his class became thoroughly untestable because naturally, the first call to his stuff triggered the first lazy call to the satellite, which triggered the first lazy call to the black hole Singleton, which fired up every threading model, logger, aspect, and bit of file I/O in the history of creation, crushing the test runner like an insignificant gnat.

And this, to me, is perhaps the best argument for the Interface Segregation Principle that I’ve ever heard. As a quick recap, the ISP says that “clients should not be forced to depend upon interfaces that they don’t use.” In my friend’s case, depending on the utility method that he was being forced to use, in turn, made him depend on an entire, 1K+ LOC Singleton being initialized and, indirectly, a whole host of other application functionality. This was about the most egregious violation that I’d ever encountered.

What’s the alternative? Well, years later, it’s hard to list specifics, but how about pulling that utility out somewhere? If it relied on no global state, this is a no-brainer — just put it into its own class somewhere. Static, instance — it doesn’t matter — just not in that Singleton. If it relied on global state, then create a small interface with one method, have the satellite Singleton implement it, and hand that interface to the class in question. In either case, my friend’s class depends on a single method for that functionality and not the entire application domain and the code responsible for loading it (as well as loggers and other ancillary nonsense).

The Interface Segregation Principle asks you to keep your dependencies to a minimum. This will help you unit test, but it will also help your sanity at maintenance time. After all, would you rather debug a class that you knew depended on an external method or two, or one whose behavior could be explained by any one of tens of thousands of lines of code? These things will matter to you or whoever maintains the application. A lot. So think of this example and think of the ISP as you’re writing your code.


Moving Away from State: State–

In 1968, Edsger Dijkstra issued a letter entitled “Go To Statement Considered Harmful,” and the age of structured programming was born. The letter was a call to stop programmers of the time from creating ad-hoc control flow structures using goto statements and to instead use these higher level constructs for manipulating flow through a function (contrary, I think, to the oft-attributed position that “goto is evil”). This gave rise to structured programming because it meant that progress in a method would be more visually trackable.

But I think there’s an interesting underlying concept here that informs a lot of shifts in programming practice. Specifically, I’m referring to the idea that Dijkstra conceived of “goto” as existing at an inappropriate level of abstraction when stacked with concepts like “if” and “while” and “case.” Those latter are elements of logical human reasoning while the former is a matter of procedure for a compiler. Or, to put it another way, the control flow statements tell humans how to read business logic, and the goto tells the compiler how to execute a program.

“State Considered Harmful” seems to be the new trend, ushering in the renaissance of functional programming. Functional programming itself is not a new concept. Lisp has been around for half a century, meaning that it actually predates object-oriented programming. Life always seems to be full of cycles, and this is certainly an example. But there’s more to it than people seeking new solutions in old ideas. The new frontier in faster processing and better computer performance is parallel processing. We can’t fit a whole lot more transistors on a chip, but we can fit more chips in a computer — and design schemes to split the work between them. And in order to do that successfully, it’s necessary to minimize the amount of temporarily stored information, or state.

I’ve found myself headed in that direction almost subconsciously. A lot of the handiest tools push you that way. I’ve always loved the fluent Linq methods in C#, and those generally serve as a relatively painless introduction to functional programming. You find yourself gravitating away from nested loops and local variables in favor of chained calls that express semantically what you want in only a line or two of code. But gravitating toward functional programming style goes beyond just using something like Linq, and it involves favoring chains of methods in which the output is a pure, side-effect-free function of the input.


Here have been some of the stops on my own journey away from state.

  1. Elimination of global state. The first step in this journey for me was to realize how odious global state is for the maintainability of an application.
  2. No state variables for communication between methods. Next to go for me were ‘flag’ variables that kept track of things in a class between method calls. As I became more proficient in unit testing, I found this to be a huge headache as it created complicated and brittle test setup, and it was really a pointless crutch anyway.
  3. Immutable > mutable. I’ve blogged about an idea I called “pointless mutability,” but in general I’ve come to favor immutable constructs over mutable ones whenever possible for simplicity.
  4. State isolation — for instance, model objects and domain objects for business logic state and viewmodels/controllers for GUI state. Aside from that, a lot of applications for which I am the architect retain virtually no state information. Services and other application scaffolding types of classes simply have interfaced references to their collaborators, but really keep track of nothing between method calls.
  5. Persistence ignorance — letting the application pretend that its state is storage. For less sophisticated (CRUD-style) applications, I’ve favored scenarios in which most of the code’s state is abstracted into a lower layer of the application and implemented only externally. In other words, if things are simple, let something like a database or file system be your application’s state. Why cache and over-complicate until/unless performance is an issue?

And that’s where I stand now as I write object-oriented code. I am interested in diving more into functional languages, as I’ve only played with them here and there since my undergrad days. It isn’t so much that they’re the new hotness as it is that I find myself heading that way anyway. And if I’m going to do it, I might as well do it consciously and in directed fashion. If and when I do get to do more playing, you can definitely bet that I’ll post about it.


Please Don’t Recycle Local Variables

I think there’s a lot of value to the conservation angle of the green movement. In general, it’s a matter of efficiency–if you can heat/light/whatever your house with the same quality of life, using less energy and fewer resources, that’s a win for everyone. This applies to a whole lot of things beyond just eco-concerns, however. Conserving heat when you’re cold, conserving energy when you’re running a marathon, conserving your dollars when making a budget–all good ideas. Cut down, conserve, reuse when you can.


Except please don’t do it with your local variables. For example:

Here, we initialize a local variable, count, and use it to keep track of the results of some processing of customers. When we’re done, we reset count and use it to keep track of the apparently unrelated concept of machines. What I’m saying is that there shouldn’t be just one count, but rather customerCount and machineCount.

Does this seem like nitpicking? You could certainly make that argument, but this code is not going to age well. First of all, this method should clearly be two methods, so we’re starting right off the bat with a bit of technical debt. It would be cleaner if each loop had its own method.

But an interesting thing happens if we use the refactoring tools to try to do that–the refactoring tool wants return values or input parameters. Yikes, that was unexpected, so we just move on. Later, when the time comes to iterate over movies, we see that there’s a ‘design pattern’ in place, so we modify the code to look like this:

Now this thing should really be split up, so we start selecting parts of it to see what we can refactor. Ew, now we’re getting ref parameters to boot. This thing is getting even more painful to try to refactor, and we’re in a hurry, so no time for that. And to make matters worse, if you add in a few other aggregator variables this way, you’ll start to have all kinds of barriers in place when you want to pull this thing apart, such as crazy sets of out parameters. I’ve posted before about how I feel about ref and out.

All of this mounting technical debt could easily be avoided by giving each loop its own count variable. Having them recycle the same one creates a compile-time dependency of what’s going on in each loop with what happened in the loop before, even though there are no other similar dependencies in evidence. In other words, recycling this local variable is the only thing that’s creating a coupling in your code–there’s no logical reason to do it.

This is the height of procedural programming and baking in temporal dependencies that I cautioned you to avoid here. It’s a completely useless dependency that will inhibit refactoring and dirty up your code in a hurry. It may not seem like much yet, but this will be a huge pain point later as the lines of code in this method balloon from the dozens to the hundreds, and you rely heavily on automated tools to help with cleanup. Flag variables used over and over in sequence throughout a method are like pebbles in your shoe when you’re trying to refactor.

So my advice is to avoid this practice completely. There’s really no advantage to coding this way and the potential downside is enormous.


How to Keep Method Size Under Control

Do you ever open a source code file and see a method that starts at the top of your screen and kind of oozes its way to the bottom with no end in sight? When you find yourself in that situation, imagine that you’re reading a ticker tape and try to guess at where the method actually ends. Is it a foot below the monitor? Three feet? Does it plummet through the floor and into the basement, perhaps down past the water table and into the earth’s mantle?


Visualized like this, I think everyone might agree that there’s some point at which the drop is too far, though there’s likely some disagreement on where exactly this is. Personally, I used to subscribe to the “fits on a screen” heuristic and would only start looking to pull out methods if it got beyond that. But in more recent years, I think even smaller. How small? I dunno–five or six lines, max. Small enough that you’ll only ever see one try-catch or control flow statement in there. Yeah, seriously, that small. If you’re thinking it sounds kind of crazy, I get that, but give it a try for a while. I can almost guarantee that you’ll lose your patience for looking at methods that cause you to think, “wait, where was loopCounter declared again–before the second or third while loop?”

If you accept the premise that this is a good way to do things or that it might at least be worth a try, the first thing you’ll probably wonder is how to go about doing this from a practical standpoint. I’ve definitely encountered people and even whole groups who considered method sizes like this to be impractical. The first thing you have to do is let go of the notion that classes are in some kind of limited supply and you have to be careful not to use too many. Same with modules, if your project gets big enough. The reason I say this is that having small methods means that you’re going to have a lot of them. This in turn means that they’re going to need to be spread to multiple classes, and those classes will occupy more namespaces and modules. But that’s okay. If you encounter a large application that’s well designed and factored, it’s that way because the application is actually a series of small, focused components working together. Monolithic doesn’t scale well.

Getting Down to Business

If you’ve prepared yourself for the reality of needing more classes organized into more namespaces and modules, you’ve really overcome the biggest obstacle to being a small-method coder. Now it’s just a question of mechanics and practice. And this is actually important–it’s not sufficient to just say, “I’m going to write a lot of methods by stopping at the fifth line, no matter what.” I guarantee you that this is going to create a lot of weird cross-coupling, unnecessary state, and ugly things like out parameters. Nobody wants that. So it’s time to look to the art of creating abstractions.

As a brief digression, I’ve recently picked up a copy of Uncle Bob Martin’s Clean Code: A Handbook of Agile Software Craftsmanship and been tearing my way through it pretty quickly. I’d already seen most of the Clean Coder video series, which covers some similar ground, but the book is both a good review and a source of new and different information. To be blunt, if you’re ever going to invest thirty or forty bucks in getting better at your craft, this is the thing to buy. It’s opinionated, sometimes controversial, incredibly specific, and absolute mandatory reading. It will change your outlook on writing code and make you better at what you do, even if you don’t agree with every single point in it (though I don’t find much with which to take issue, personally).

The reason I mention this book and series is that there is an entire section in the book about functions/methods, and two of its fundamental points are that (1) functions should do one thing and one thing only, and (2) that functions should have one level of abstraction. To keep those methods under control, this is a great place to start. I’d like to dive a little deeper, however, because “do one thing” and “one level of abstraction per function” are general instructions. It may seem a bit like hand-waving without examples and more concrete heuristics.

Extract Finer-Grained Details

What Uncle Bob is saying about mixed abstractions can be demonstrated in this code snippet:

Do you see what the issue is? We have a method here that describes (via sub-methods that are not pictured) how to open a door. The first two calls talk in terms of actions between you and the door, but the next three calls suddenly dive into the specifics of how to pull the door open in terms of actions taken by your muscles, joints, tendons, etc. These are two different layers of abstractions: one about a person interacting with his or her surroundings and the other detailing the mechanics of body movement. To make it consistent, we could get more detailed in the first two actions in terms of extending arms and tightening fingers. But we’re trying to keep methods small and focused, so what we really want is to do this:

Create Coarser Grained Categories

What about a different problem? Let’s say that you have a method that’s long, but it isn’t because you are mixing abstraction levels:

These items are all at the same level of abstraction, but there are an awful lot of them. In the previous example, we were able to tighten up the method by making the abstraction levels consistent, but here we’re going to actually need to add a layer of abstraction. This winds up looking a little better:

In essence, we’ve created categories and put the actions from the long method into them. What we’ve really done here is create (or add to) a tree-like structure of methods. The public method is the root, and it had thirteen children. We gave it instead four children, and each of those children has between two and five children of its own. To tighten up methods, it’s perfectly viable to add “nodes” to the “tree” of your call stack. While “do one thing” is still a little elusive, this seems to be carrying us in that direction. There’s no individual method that you look at and think, “boy, that’s a lot of stuff going on.” Certainly its a matter of some art and taste, but this is probably a good way to think of it–organize stuff into hierarchical method categories until you look at each method and think, “I could probably memorize what that does if I needed to.”

Recognize that Control Flow Uses Up an Abstraction

So far we’ve been conceptually figuring out how to organize families of methods into well-balanced tree structures, and that’s taken us through some pretty mundane code. This code has involved none of the usual stuff that sends apps careening off the rails into bug land, such as conditionals, loops, assignment, etc. Let’s correct that. Looking at the code above, think of how you’d modify this to allow for the preparation of an arbitrary number of quesadillas. Would it be this?

Well, that makes sense, right? Just like the last version, this is something you could read conversationally while in the kitchen just as easily as you do in the code. Prep your ingredients, then prep your equipment, then for some integer index equal to zero and less than the number of quesadillas you want to cook, increment the integer by one. Each time you do that, cook the quesadilla. Oh, wait. I think we just went careening into the nerdiest kitchen narrative ever. If Gordon Ramsey were in charge, he’d have strangled you with your apron for that. Hmm… how ’bout this?

Well, I’d say that the CookQuesadillas method looks a lot better, but do we like “PerformActualCooking?” The whole situation is an improvement, but I’m not a huge fan, personally. I’m still mixing control flow with a series of domain concepts. PerformActualCooking is still both a story about for-loops and about cooking. Let’s try something else:

We’ve added a node to the tree that some might say is one too many, but I disagree. What I like is the fact that we have two methods that contain nothing but abstractions about the domain knowledge of cooking and we have a bridging method that brings in the detailed realities of the programming language. We’re isolating things like looping, counting, conditionals, etc. from the actual problem solving and story telling that we want to do here. So when you have a method that does a few things and you think about adding some kind of control flow to it, remember that you’re introducing a detail to the method that is at a lower level of abstraction and should probably have its own node in the tree.

Adrift in a Sea of Tiny Methods

If you’re looking at this cooking example, it probably strikes you that there are no fewer than eighteen methods in this class, not counting any additional sub-methods or elided properties (which are really just methods in C# anyway). That’s a lot for a class, and you may think that I’m encouraging you to write classes with dozens of methods. That isn’t the case. So far what we’ve done is started to create trees of many small methods with a public method and then a ton of private methods, which is a code smell called “Iceberg Class.” What’s the cure for iceberg classes? Extracting classes from them. Maybe you turn the first two methods that prepare ingredients and equipment into a “Preparer” class with two public methods, “PrepareIngredients” and “PrepareEquipment.” Or maybe you extract a quesadilla cooking class.

It’s really going to vary based on your situation, but the point is that you take this opportunity pick nodes in your growing tree of methods and sub-methods and convert them into roots by turning them into classes. And if doing this leads you to having what seems to be too many classes in your namespace? Create more namespaces. Too many of those in a module? Create more modules. Too many modules/projects in a solution? More solutions.

Here’s the thing: the complexity exists no matter how many or few methods/classes/namespaces/modules/solutions you have. Slamming them all into monolithic constructs together doesn’t eliminate or even hide that complexity, though many seem to take the ostrich approach and pretend that it does. Your code isn’t somehow ‘simpler’ because you have one solution with one project that has ten classes, each with 300 methods of 7,000 lines. Sure, things look simple when you fire up the IDE, but they sure won’t be simple when you try to debug. In fact, they’ll be much more complicated because your functionality will be hopelessly interwoven with weird temporal couplings, ad-hoc references, and hidden dependencies.

If you create large trees of functionality, you have the luxury of making the structure of the tree the representative of the application’s complexity, with each node an island of simplicity. It is in these node-methods that the business logic takes place and that getting things right is most important. And by managing your abstractions, you keep these nodes easy to reason about. If you structure the tree correctly and follow good OOP design and practice, you’ll find that even the structure of the tree is not especially complicated since each node provides a good representative abstraction for its sub-tree.

Having small, readable, self-documenting methods is no pipe dream. Really, with a bit of practice, it’s not even very hard. It just requires you to see code a little bit differently. See it as a series of hierarchical stories and abstractions rather than as a bunch of loops, counters, pointers, and control flow statements, and the people that maintain what you write, including yourself, will thank you for it.