Code Generation Seems Like a Failure of Vision

I think that I’m probably going to take a good bit of flack for this post, but you can’t win ‘em all. I’m interested in contrary opinions and arguments because my mind could be changed. Nevertheless, I’ve been unable to shake the feeling for months that code generation is just a basic and fundamental design failure. I’ve tried. I’ve thought about it in the shower and on the drive to work. I’ve thought about it while considering design approaches and even while using it (in the form of Entity Framework). And it just feels fundamentally icky. I can’t shake the feeling.

Let me start out with a small example that everyone can probably agree on. Let’s say that you’re writing some kind of GUI application with a bunch of rather similar windows. And let’s say that mostly what you do is take all of presentation logic for the previous window, copy, paste and adjust to taste for the next window. Oh noes! We’re violating the DRY principle with all of that repetition, right?

What we should be doing instead, obviously, is writing a program that duplicates the code more quickly. That way you can crank out more windows much faster and without the periodic fat-fingering that was happening when you did it manually. Duplication problem solved, right? Er, well, no. Duplication problem automated and made worse. After all, the problem with duplicate code is a problem of maintenance more than initial push. The thing that hurts is later when something about all of that duplicated code has to be changed and you have to go find and do it everywhere. I think most reading would agree that code generation is a poor solution to the problem of copy and paste programming. The good solution is a design that eliminates repetition and duplication of knowledge.

I feel as though a lot of code generation that I see is a prohibitive micro-optimization. The problem is “I have to do a lot of repetitive coding” and code generation solves this problem by saying, “we’ll automate that coding for you.” I’d rather see it solved by saying, “let’s step back and figure out a better approach — one in which repetition is unnecessary.” The automation approach puts a band-aid on the wound and charges ahead, risking infection.

For instance, take the concept of List in C#. List is conceptually similar to an array, but it automatically resizes, thus abstracting away an annoying detail of managing collections in languages from days gone by. I’m writing a program and I think I want an IntList, which is a list of integers. That’s going along swimmingly until I realize that I need to store some averages in there that might not be round numbers, so I copy the source code IntList to DoubleList and I do a “Find-And-Replace” with Int and Double. Maybe later I also do that with string, and then I think, “geez — someone should write a program that you just tell it a type and it generates a list type for it.” Someone does, and then life is good. And then, later, someone comes along with the concept of generics/templates and everyone feels pretty sheepish about their “ListGenerator” programs. Why? Because someone actually solved the core problem instead of coming up with obtuse, brute-force ways to alleviate the symptoms.

And when you pull back and think about the whole idea of code generation, it’s fairly Rube-Goldbergian. Let’s write some code that writes code. It makes me think of some stoner ‘brainstorming’ a money making idea:

Inve ntions

I realize that’s a touch of hyperbole, but think of what code generation involves. You’re going to feed code to a compiler and then run the compiled program which will generate code that you feed to the compiler, again, that will output a program. If you were to diagram that out with a flow chart and optimize it, what would you do? Would you get rid of the part where it went to the compiler twice and just write the program in the first place? (I should note that throughout this post I’ve been talking about this circular concept rather than, say, the way ASP or PHP generate HTML or the way Java compiles to bytecode — I’m talking about generating code at the same level of abstraction.)

The most obvious example I can think of is the aforementioned Entity Framework that I use right now. This is a framework utility that uses C# in conjunction with a markup language (T4) to generate text files that happen to be C# code. It does this because you have 100 tables in your database and you don’t want to write data transfer objects for all of them. So EF uses reflection and IQuerable with its EDMX to handle the querying aspect (which saves you from the fate we had for years of writing DAOs) while using code generation to give you OOP objects to represent your data tables. But really, isn’t this just another band-aid? Aren’t we really paying the price for not having a good solution to the Impedance Mismatch Problem?

I feel a whole host of code gen solutions is also born out of the desire to be more performant. We could write something that would look at a database table and generate, on the fly, using reflection, a CRUD form at runtime for that table. The performance would be poor, but we could do it. However, confronted with that performance, people often say, “if only there were a way to automate the stuff we want but to have the details sorted out at compile time rather than runtime.” At that point the battle is already won and the war already lost, because it’s only a matter of time until someone writes a program whose output is source code.

I’m not advocating a move away from code generating, nor am I impugning anyone for using it. This is post more in the same vein as ones that I’ve written before (about not using files for source code and avoiding using casts in object oriented languages). Code generation isn’t going anywhere anytime soon, and I know that I’m not even in a position to quit my reliance on it. I just think it’s time to recognize it as an inherently flawed band-aid rather than to celebrate it as a feat of engineering ingenuity.

  • http://www.limulus.net/ Eric McCarthy

    If I remember correctly, The Pragmatic Programmer talks about the intersection of code generation and DRY. What it comes down to is, if you do a `make clean`, do the generated source files get blown away? If they can be blown away, it passes DRY, otherwise you’ve got a potential problem.

  • http://www.daedtech.com/blog Erik Dietrich

    It seems as though this problem is most often solved (as is the case with EF) using partial classes in C#. I believe in the Java world, inheritance is the ticket. I’ll have to thumb back through my copy of Pragmatic Programmer — I don’t really remember them talking about code generation, so that’ll be worth a re-read for perspective. Thanks!

  • http://www.schmonz.com/ Amitai Schlair

    I don’t see how your argument can possibly be wrong. If you think you want help from a code-generator, then the code you’re writing (or meta-writing) is by definition boilerplate. Boilerplate means “feels like too many lines of code to do what I want”, and that’s bad for all the reasons more lines of code are bad (correctness, maintainability, a signal that some abstraction is insufficiently expressive, etc.). If you were claiming we should all immediately avoid any kind of boilerplate — whether automatically or manually generated — that would hard to support, because some really common things would be hard to do. But you’re only claiming that we should feel uncomfortable about it, and that’s just true.

  • http://www.daedtech.com/blog Erik Dietrich

    I think you put that better than I did, quite frankly. My take is exactly that using code generation should make you feel uncomfortable.

    It always just feels to me like I’m generating more, as you put it, boilerplate, by doing it faster. And I don’t know how to count that in the win column.

  • http://www.schmonz.com/ Amitai Schlair

    Well, my head isn’t up this problem right now, so it’s easy for me to pontificate and hard for me to write much more than that. (The problem my head is currently up: how to work effectively from everywhere, using the finite set of examples I’ve personally evaluated. I’m having a hell of a time figuring out how to structure my hard-won knowledge in way that still counts as “writing”.)

    This post has a lot of truths in it. I feel like you could dig gently and unearth a whole series of assumptions that are common sense to you and me, yet surprising to people who don’t live software development (or haven’t lived it long enough, or haven’t been paying enough attention). For instance, we know from experience that our aversions to boilerplate and to any sort of automation thereof are aversions to pain. They’re relatively nuanced aversions to pain, in that we’ve learned to hurt about these things now because we know we’ll hurt about them later. So one of the assumptions I’d love to see you write about (if you haven’t already) is what to do with the pain you encounter as a conscientious developer. When you feel some, how do you choose what to do about it? Maybe not just pain from the build or the code, but also pain from managers or customers. Technical pain, product pain, organizational pain.

    Our antipathy for boilerplate is a learned antipathy for sweeping problems under the rug, because we know that’s the wrong metaphor (unless our domiciles are covered with rugs and filled with swept problems). But not everyone has learned it. And sometimes that’s because people don’t expect to be around for the consequences, or have learned helplessness about the consequences, and so on. The fact that you and I have managed to learn this particular lesson reflects a zillion favorable interactions in our learning environments for a long, long time. So if you wanted to, you could reflect on how you were able to derive a conclusion such as this one from the experiences you’ve had, and find material for a whole slew of fascinating posts.

    Sure, I could do the same. But I’d rather read yours. :-)

  • Vince

    The underlying point your making I agree with. Something is inherently flawed in our approach given your example of the GUI, but I think code generation will always be around to solve specific types of problems, and that’s OK.

    If your working with an external system such as a database, you must write SQL to manipulate data, there’s no way around this. For the 90% of CRUD queries you write, you get a huge performance boost with the generated SQL. It’s by no means a band-aid solution, it’s a solution for 90% of your workload.

  • Richard Gardiner

    I don’t have a problem with code generation as a one-off process. For example, I have a list of “things” in a file and I need to perhaps generate a class for each item that might implement a defined interface.

    The EF code generation (and earlier, even nastier database code generators like typed datasets) however I do not like. I’m never quite sure what they are going to do or when they might change suddenly.

    And as for using partial classes as part of the process – they’re like Regions hyped up on caffeine and a bunch of illegal substances. I have a class and half the code is in a different file written in a totally different way.

    It’s all a hack.

  • http://www.daedtech.com/blog Erik Dietrich

    This is sort of what I’m thinking of when I talk about the impedance mismatch. If it’s a given that you’re going to be programming in an object oriented language, writing CRUD for dozens of tables and you don’t want to/can’t take the performance hit of reflection, then code generation is clearly your best bet.

    But if you pull back a bit, maybe the question to ask is “why do we keep doing this?” Maybe relational databases and dynamics are the way to go. Maybe document databases with no impedance mismatch are the way to go. Or object oriented databases. I’m not stumping for any of those things, per se — I’m just constantly in the process of trying to figure out if the things I’m doing are dumb, or, at least if I’ve hit some kind of local maximum with them.

  • http://www.daedtech.com/blog Erik Dietrich

    Interesting… that sounds like code generation as kind of a one time “boost” to a project. Not really a use case I’d considered or encountered previously.

  • http://www.daedtech.com/blog Erik Dietrich

    I think that’s a fascinating idea for a blog post subject — thanks! There is a lot of pain management, for lack of a better term, in what developers do, but I think the meta angle of automation is what makes this a pretty unique discussions. I mean everyone with a job learns things like “don’t work with Bill because he’s lazy,” but programmers have a unique ability to control their feedback loops and automate, giving us ways to control when, where and how we experience pain in what we do almost like some weird form of negative capital.

    I’ve thrown a draft in my folder to address this subject, and I’m hoping it winds up being an interesting post to write and read. I also haven’t forgotten the posts about the review process, but those will probably come earlier in the new year, when it comes time for me to actually do them.

  • Jace Rhea

    I agree its a design problem. Have you taken a look at F# type providers? They address exactly this problem.

  • http://www.daedtech.com/blog Erik Dietrich

    I’m familiar with the concept and have played with them in a no consequence playpen environment. That said, I’d need to do some learning to understand how to use them for anything that wasn’t pure generic. In other words, I can read an arbitrary set of data with [key,value] and bypass typing if I just want to output it as-is to a screen or something (general code for reading a bunch of tables and doing select or simple CRUD), but what happens when business rules enter the mix?

  • http://jacerhea.wordpress.com/ Jace Rhea

    Both code generation and type providers only apply to type generation that can be applied in a generic way. When per type rules need to be applied than automated type generation is probably not the answer.