The Way We Write Code is Stupid: Source Code Files Considered Harmful
Order Doesn’t Matter
Please pardon the loaded phrasing in the title, but that’s how the message came to me from my subconscious brain: bluntly and without ceremony. I was doing a bit of work in Apex, the object-oriented language specific to Salesforce.com, and it occurred to me that I had no idea what idiomatic Apex looked like. (I still don’t.) In C++, the convention (last time I was using it much, anyway) is to first define public members in class headers and then the private members at the bottom. In C#, this is inverted. I’ve seen arguments of all sorts as to which approach is better and why. Declaring them at the top makes sense since the first thing you want to see in the class is what its state will consist of, but declaring the public stuff at the top makes sense since that’s what consumers will interact with and it’s like the above-water part of your code iceberg.
When programming in any of the various programming languages I know, I have this mental cache of what’s preferred in what language. I attempt to ‘speak’ it without an accent. But with Apex, I have no idea what the natives ‘sound’ like, not having seen it in use before. Do I declare instance variables at the bottom or the top? Which is the right way to eat bread: butter side up or butter side down? I started googling to see what the ‘best practice’ was for Apex when the buzzing in my subconscious reached some kind of protesting critical mass and morphed into a loud, clear message: “this is completely stupid.”
I went home for the day at that point–it was late anyway–and wondered what had prompted this visceral objection. I mean, it obviously didn’t matter from a compiled code perspective whether instance variables or public methods come first, but it’s pretty well established and encouraged by people as accomplished and prominent as “Uncle” Bob Martin that consistency of source code layout matters, if not the layout specifics (paraphrased from my memory of his video series on Clean Coders). I get it. You don’t want members of your team writing code that looks completely different from class to class because that creates maintenance headaches and obscures understanding. So what was my problem?
I didn’t know until the next morning in the shower, where I seem to do my most abstract thinking. I didn’t think it was stupid to try to make my Apex code look like ‘standard’ Apex. I thought it was stupid that I needed to do so at all. I thought it was stupid to waste any time thinking about how to order code elements in this file when the only one whose opinion really matters–the compiler–says, “I don’t care.” Your compiler is trying to tell you something. Order doesn’t matter to it, and you shouldn’t care either.
Use Cases: What OOP Developers Want
But the scope of my sudden, towering indignation wasn’t limited to the fact that I shouldn’t have to care about the order of methods and fields. I also shouldn’t have to care about camel or Pascal casing. I shouldn’t have to care about underscores in front of field names or inside of method names. It shouldn’t matter to me if public methods come before private or how much indentation is the right amount of indentation. Should methods be alphabetized or should they be in some other order? I don’t care! I don’t care about any of this.
Let’s get a little more orderly about this. Here are some questions that I ask frequently when I’m writing source code in an OOP language:
- What is the public API of this type?
- What private methods are in the ‘tree’ of this public method?
- What methods of this type mutate or reference this field?
- What are the types in this namespace?
- What are the implementations of this interface in this code base?
- Let’s see this method and any methods that it overrides.
- What calls this method?
Here are some questions that I never ask out of actual interest when writing source code. These I either don’t ask at all or ask in exasperation:
- What’s the next method in this file?
- How many line feed characters come before the declaration of this variable?
- Should I use tabs or spaces?
- In what region is this field’s declaration?
- Did the author of this file alphabetize anything in it?
- Does this source file have Windows or *NIX line break characters?
- Is this a field or a method or what?
With the first set of questions, I ask them because they’re pieces of information that I want while reasoning about code. With the second set of questions, they’re things I don’t care about. I view asking these questions as an annoyance or failure. Do you notice a common pattern? I certainly do. All of the questions whose answers interest me are about code constructs and all the ones that I don’t care about have to do with the storage medium for the code: the file.
But there’s more to the equation here than this simple pattern. Consider the first set of questions again and ask yourself how many of the conventions that we establish and follow are simply ham-fisted attempts to answer them at a glance because the file layout itself is incapable of doing so. Organizing public and private separately is a work-around to answer the first question, for example. Regions in C#, games with variable and method naming, “file” vs “type” view, etc. are all attempts to overcome the fact that files are actually really poor communication media for object-oriented concepts. Even though compilers are an awful lot different now than they were forty years ago, we still cling to the storage medium for source code best suited to those old compilers.
Not Taking our own Advice
If you think of an ‘application’ written in MS Access, what comes to mind? How about when you open up an ASP web application and find wizard-generated data sources in the markup, or when you open up a desktop application and find SQL queries right in your code behind? I bet you think “amateurs wrote this.” You are filled with contempt for the situation–didn’t anyone stop to think about what would happen if data later comes in some different form? And what about some kind of validation? And, what the–ugh… the users are just directly looking at the tables and changing the column order and default sorting every time they look at the data. Is everyone here daft? Don’t they realize how ridiculous it is to alter the structure of the actual data store every time someone wants a different ordering of the data?
And you should see some of the crazy work-arounds and process hacks they have in place. They actually have a scheme where the database records the name of everyone who opens up a table and makes any kind of change so that they can go ask that person why they did it. And–get this–they actually have this big document that says what the order of columns in the table should be. And–you can’t make this stuff up–they fight about it regularly and passionately. Can you believe the developers that made this system and the suckers that use it? I mean, how backward are they?
In case you hadn’t followed along with my not-so-subtle parallel, I’m pointing out that we work this way ourselves even as we look with scorn upon developers who foist this sort of thing on users and users who tolerate it. This is like when you finally see both women in the painting for the first time–it’s so clear that you’ll never un-see it again. Why do we argue about where to put fields and methods and how to order things in code files when we refuse to write code that sends users directly into databases, compelling them to bicker over the order of column definition in the same? RDBMS (or any persistence store) is not an appropriate abstraction for an end user–any end user–whether he understands the abstraction or not. We don’t demand that users fight, decide that there is some ‘right’ way to order invoices to be printed, and then lock the Invoice table in place accordingly for all time and pain of shaming for violations of an eighty-page invoice standard guideline document. So why do that to ourselves? When we’re creating object-oriented code, sequential files, and all of the particular orderings, traversings and renderings thereof are wildly inappropriate abstractions for us.
What’s the Alternative?
Frankly, I don’t know exactly what the alternative is yet, but I think it’s going to be a weird and fun ride trying to figure that out. My initial, rudimentary thoughts on the matter are that we should use some sort of scheme in which the Code DOM is serialized to disk for storage purposes. In other words, the domain model of code is that there is something called Project, and it has a collection of Namespace. Namespace has a collection of Type, which may be Interface, Enum, Struct, Class (for C# anyway–for other OOP languages, it’s not hard to make this leap). Class has one collection each of Field, Method, Property, Event. The exact details aren’t overly important, but do you see the potential here? We’re creating a hierarchical model of code that could be expressed in nested object or relational format.
In other words, we’re creating a domain model entirely independent of any persistence strategy. Can it be stored in files? Sure. Bob’s your uncle. You can serialize these things however you want. And it’ll need to be written to file in some form or another for the happiness of the compiler (at least at first). But those files handed over to the compiler are output transforms rather than the lifeblood of development.
Think for a minute of us programmers as users of a system with a proper domain, one or more persistence models, and a service layer. Really, stop and mull that over for a moment. Now, go back to the use cases I mentioned earlier and think what this could mean. Here are some properties of this system:
- The basic unit of interaction is going to be the method, and you can request methods with arbitrary properties, with any filtering and any ordering.
- What appears on your screen will probably be one or more methods (though this would be extremely flexible).
- It’s unlikely that you’d ever be interested in “show me everything in this type.” Why would you? The only reason we do this now is that editing text files is what we’re accustomed to doing.
- Tracing execution paths through code would be much easier and more visual and schemes that look like Java’s “code bubbles” would be trivial to create and manipulate.
- Most arguments over code standards simply disappear as users can configure IDE preferences such as “prepend underscores to all field variables you show me,” “show me everything in camel casing,” and, “always sort results in reverse alphabetical order.”
- Arbitrary methods from the same or different types could be grouped together in ad-hoc fashion on the screen for analysis or debugging purposes.
- Version/change control could occur at the method or even statement level, allowing expression of “let’s see all changes to this method” or “let’s see who made a change to this namespace” rather than “let’s see who changed this file.”
- Relying on IDE plugins to “hop” to places in the code automatically for things like “show all references” goes away in favor of an expressive querying syntax ala NDepend’s “code query language.”
- New domain model allows baked-in refactoring concepts and makes operations like “get rid of dead code” easier or trivial, in some cases.
Longer Reaching Impact
If things were to go in this direction, I believe that it would have a profound impact not just on development process but also on the character and quality of object oriented code that is written in general. The inherently sequential nature of files and the way that people reason about file parsing, I believe, lends to or at least favors the dogged persistence of procedural approaches to object oriented programming (static methods, global state, casting, etc.). I think that the following trends would take shape:
- Smaller methods. If popping up methods one at a time or in small groups becomes the norm, having to scroll to see and understand a method will become an anomaly, and people will optimize to avoid it.
- Less complexity in classes. With code operations subject to a validation of sorts, it’d be fairly easy to incorporate a setting that warns users if they’re adding the tenth or twentieth or whatever method to a class. In extreme cases, it could even be disallowed (and not through the honor system or ex post facto at review or check in–you couldn’t do it in the first place).
- Better conformance to Single Responsibility Principle (SRP). Eliminating the natural barrier of “I don’t want to add a new file to source control” makes people less likely awkwardly to wedge methods into classes in which they do not belong.
- Better cohesion. It becomes easy to look for fields hardly used in a type or clusters of use within a type that could be separated easily into multiple types.
- Better test coverage. Not only is this a natural consequence of the other items in this list, but it would also be possible to define “meta-data” to allow linking of code items and tests.
Well, the first things that I need to establish is that this doesn’t already exist somewhere in the works and that I’m not a complete lunatic malcontent. I’d like to get some feedback on this idea in general. The people to whom I’ve explained a bit so far seem to find the concept a bit far-fetched but somewhat intriguing.
I’d say the next step, assuming that this passes the sanity check would be perhaps to draw up a white paper discussing some implementation/design strategies with pros and cons in a bit more detail. There are certainly threats to validity to be worked out such as the specifics of interaction with the compiler, the necessarily radical change to source control approaches, the performance overhead of performing code transforms instead of just reading a file directly into memory, etc. But off the top of my head, I view these things more as fascinating challenges than problems.
In parallel, I’d like to invite anyone who is at all interested in this idea to drop me an email or send me a tweet. If there are others that feel the way I do, I think it’d be really cool to get something up on Github and maybe start brainstorming some initial work tasks or exploratory POCs for feasibility studies. Also feel free to plus-like-tweet-whatever to others if you think they might be interested.
In conclusion I’ll just say that I feel like I’d really like to see this gain traction and that I’d probably ratchet this right to the top of my side projects list if people are interested (this being a bit large in scope for me alone in my spare time). Now whenever I find myself editing source files in an IDE I feel like a bit of a barbarian, and I really don’t think we shouldn’t have to tolerate this state of affairs anymore. Productivity tools designed to hide the file nature of our source code from us help, but they’re band-aids when we need disinfectants, antibiotics, and stitches. I don’t know about you, but I’m ready to start writing my object-oriented code using an IDE paradigm that doesn’t support GOTO Line as if we were banging out QBasic in 1986.