DaedTech

Stories about Software

By

Code Generation Seems Like a Failure of Vision

I think that I’m probably going to take a good bit of flack for this post, but you can’t win ’em all. I’m interested in contrary opinions and arguments because my mind could be changed. Nevertheless, I’ve been unable to shake the feeling for months that code generation is just a basic and fundamental design failure. I’ve tried. I’ve thought about it in the shower and on the drive to work. I’ve thought about it while considering design approaches and even while using it (in the form of Entity Framework). And it just feels fundamentally icky. I can’t shake the feeling.

Let me start out with a small example that everyone can probably agree on. Let’s say that you’re writing some kind of GUI application with a bunch of rather similar windows. And let’s say that mostly what you do is take all of presentation logic for the previous window, copy, paste and adjust to taste for the next window. Oh noes! We’re violating the DRY principle with all of that repetition, right?

What we should be doing instead, obviously, is writing a program that duplicates the code more quickly. That way you can crank out more windows much faster and without the periodic fat-fingering that was happening when you did it manually. Duplication problem solved, right? Er, well, no. Duplication problem automated and made worse. After all, the problem with duplicate code is a problem of maintenance more than initial push. The thing that hurts is later when something about all of that duplicated code has to be changed and you have to go find and do it everywhere. I think most reading would agree that code generation is a poor solution to the problem of copy and paste programming. The good solution is a design that eliminates repetition and duplication of knowledge.

I feel as though a lot of code generation that I see is a prohibitive micro-optimization. The problem is “I have to do a lot of repetitive coding” and code generation solves this problem by saying, “we’ll automate that coding for you.” I’d rather see it solved by saying, “let’s step back and figure out a better approach — one in which repetition is unnecessary.” The automation approach puts a band-aid on the wound and charges ahead, risking infection.

For instance, take the concept of List in C#. List is conceptually similar to an array, but it automatically resizes, thus abstracting away an annoying detail of managing collections in languages from days gone by. I’m writing a program and I think I want an IntList, which is a list of integers. That’s going along swimmingly until I realize that I need to store some averages in there that might not be round numbers, so I copy the source code IntList to DoubleList and I do a “Find-And-Replace” with Int and Double. Maybe later I also do that with string, and then I think, “geez — someone should write a program that you just tell it a type and it generates a list type for it.” Someone does, and then life is good. And then, later, someone comes along with the concept of generics/templates and everyone feels pretty sheepish about their “ListGenerator” programs. Why? Because someone actually solved the core problem instead of coming up with obtuse, brute-force ways to alleviate the symptoms.

And when you pull back and think about the whole idea of code generation, it’s fairly Rube-Goldbergian. Let’s write some code that writes code. It makes me think of some stoner ‘brainstorming’ a money making idea:

Inve ntions

I realize that’s a touch of hyperbole, but think of what code generation involves. You’re going to feed code to a compiler and then run the compiled program which will generate code that you feed to the compiler, again, that will output a program. If you were to diagram that out with a flow chart and optimize it, what would you do? Would you get rid of the part where it went to the compiler twice and just write the program in the first place? (I should note that throughout this post I’ve been talking about this circular concept rather than, say, the way ASP or PHP generate HTML or the way Java compiles to bytecode — I’m talking about generating code at the same level of abstraction.)

The most obvious example I can think of is the aforementioned Entity Framework that I use right now. This is a framework utility that uses C# in conjunction with a markup language (T4) to generate text files that happen to be C# code. It does this because you have 100 tables in your database and you don’t want to write data transfer objects for all of them. So EF uses reflection and IQuerable with its EDMX to handle the querying aspect (which saves you from the fate we had for years of writing DAOs) while using code generation to give you OOP objects to represent your data tables. But really, isn’t this just another band-aid? Aren’t we really paying the price for not having a good solution to the Impedance Mismatch Problem?

I feel a whole host of code gen solutions is also born out of the desire to be more performant. We could write something that would look at a database table and generate, on the fly, using reflection, a CRUD form at runtime for that table. The performance would be poor, but we could do it. However, confronted with that performance, people often say, “if only there were a way to automate the stuff we want but to have the details sorted out at compile time rather than runtime.” At that point the battle is already won and the war already lost, because it’s only a matter of time until someone writes a program whose output is source code.

I’m not advocating a move away from code generating, nor am I impugning anyone for using it. This is post more in the same vein as ones that I’ve written before (about not using files for source code and avoiding using casts in object oriented languages). Code generation isn’t going anywhere anytime soon, and I know that I’m not even in a position to quit my reliance on it. I just think it’s time to recognize it as an inherently flawed band-aid rather than to celebrate it as a feat of engineering ingenuity.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.
16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Eric McCarthy
10 years ago

If I remember correctly, The Pragmatic Programmer talks about the intersection of code generation and DRY. What it comes down to is, if you do a `make clean`, do the generated source files get blown away? If they can be blown away, it passes DRY, otherwise you’ve got a potential problem.

Erik Dietrich
10 years ago
Reply to  Eric McCarthy

It seems as though this problem is most often solved (as is the case with EF) using partial classes in C#. I believe in the Java world, inheritance is the ticket. I’ll have to thumb back through my copy of Pragmatic Programmer — I don’t really remember them talking about code generation, so that’ll be worth a re-read for perspective. Thanks!

Amitai Schlair
10 years ago

I don’t see how your argument can possibly be wrong. If you think you want help from a code-generator, then the code you’re writing (or meta-writing) is by definition boilerplate. Boilerplate means “feels like too many lines of code to do what I want”, and that’s bad for all the reasons more lines of code are bad (correctness, maintainability, a signal that some abstraction is insufficiently expressive, etc.). If you were claiming we should all immediately avoid any kind of boilerplate — whether automatically or manually generated — that would hard to support, because some really common things would be… Read more »

Erik Dietrich
10 years ago
Reply to  Amitai Schlair

I think you put that better than I did, quite frankly. My take is exactly that using code generation should make you feel uncomfortable.

It always just feels to me like I’m generating more, as you put it, boilerplate, by doing it faster. And I don’t know how to count that in the win column.

Amitai Schlair
10 years ago
Reply to  Erik Dietrich

Well, my head isn’t up this problem right now, so it’s easy for me to pontificate and hard for me to write much more than that. (The problem my head is currently up: how to work effectively from everywhere, using the finite set of examples I’ve personally evaluated. I’m having a hell of a time figuring out how to structure my hard-won knowledge in way that still counts as “writing”.) This post has a lot of truths in it. I feel like you could dig gently and unearth a whole series of assumptions that are common sense to you and… Read more »

Erik Dietrich
10 years ago
Reply to  Amitai Schlair

I think that’s a fascinating idea for a blog post subject — thanks! There is a lot of pain management, for lack of a better term, in what developers do, but I think the meta angle of automation is what makes this a pretty unique discussions. I mean everyone with a job learns things like “don’t work with Bill because he’s lazy,” but programmers have a unique ability to control their feedback loops and automate, giving us ways to control when, where and how we experience pain in what we do almost like some weird form of negative capital. I’ve… Read more »

Vince
Vince
10 years ago

The underlying point your making I agree with. Something is inherently flawed in our approach given your example of the GUI, but I think code generation will always be around to solve specific types of problems, and that’s OK.

If your working with an external system such as a database, you must write SQL to manipulate data, there’s no way around this. For the 90% of CRUD queries you write, you get a huge performance boost with the generated SQL. It’s by no means a band-aid solution, it’s a solution for 90% of your workload.

Erik Dietrich
10 years ago
Reply to  Vince

This is sort of what I’m thinking of when I talk about the impedance mismatch. If it’s a given that you’re going to be programming in an object oriented language, writing CRUD for dozens of tables and you don’t want to/can’t take the performance hit of reflection, then code generation is clearly your best bet. But if you pull back a bit, maybe the question to ask is “why do we keep doing this?” Maybe relational databases and dynamics are the way to go. Maybe document databases with no impedance mismatch are the way to go. Or object oriented databases.… Read more »

Richard Gardiner
Richard Gardiner
10 years ago

I don’t have a problem with code generation as a one-off process. For example, I have a list of “things” in a file and I need to perhaps generate a class for each item that might implement a defined interface. The EF code generation (and earlier, even nastier database code generators like typed datasets) however I do not like. I’m never quite sure what they are going to do or when they might change suddenly. And as for using partial classes as part of the process – they’re like Regions hyped up on caffeine and a bunch of illegal substances.… Read more »

Erik Dietrich
10 years ago

Interesting… that sounds like code generation as kind of a one time “boost” to a project. Not really a use case I’d considered or encountered previously.

Jace Rhea
Jace Rhea
10 years ago

I agree its a design problem. Have you taken a look at F# type providers? They address exactly this problem.

Erik Dietrich
10 years ago
Reply to  Jace Rhea

I’m familiar with the concept and have played with them in a no consequence playpen environment. That said, I’d need to do some learning to understand how to use them for anything that wasn’t pure generic. In other words, I can read an arbitrary set of data with [key,value] and bypass typing if I just want to output it as-is to a screen or something (general code for reading a bunch of tables and doing select or simple CRUD), but what happens when business rules enter the mix?

Jace Rhea
Jace Rhea
10 years ago
Reply to  Erik Dietrich

Both code generation and type providers only apply to type generation that can be applied in a generic way. When per type rules need to be applied than automated type generation is probably not the answer.

VInce Panuccio
7 years ago

This post reminded me (for some strange reason) of a particular piece of code from a serialization library called Jil which contained a method which was generated from a LINQPad source file to reduce method table lookups. Obviously a very specific case where it made sense. https://github.com/kevin-montrose/Jil/blob/master/Jil/JSON.cs#L115 Entity Framework generates SQL from an expression tree which is then translated to something else, but it’s not a new concept and I don’t think there’s anything wrong with that. It’s similar to a compiler, it takes language A and translates it to language B. I don’t want to make this about Entity… Read more »

Erik Dietrich
7 years ago
Reply to  VInce Panuccio

At first blush, it has the same feel to me as Entity Framework: elegant locally, inelegant globally. Or, maybe I’d think of it as locally maximizing. It’s a pretty philosophical consideration, to be sure. I mean, I’ve used and continue to use Entity Framework on plenty of projects (and EF is certainly an improvement over applications that spit out gobs of DAOS that people then commit to source control). It just strikes me as odd that we’re solving the same problem so often, albeit with variance, that we can automate the writing of software. I’d generalize it to say that,… Read more »

trackback

[…] use discretion, because you can also use this powerful construct to make a huge mess.  I wrote a post about the potential perils some years back, but suffice it to say that you should take care not to automate and speed up copy […]