Code Generation Seems Like a Failure of Vision
I think that I’m probably going to take a good bit of flack for this post, but you can’t win ‘em all. I’m interested in contrary opinions and arguments because my mind could be changed. Nevertheless, I’ve been unable to shake the feeling for months that code generation is just a basic and fundamental design failure. I’ve tried. I’ve thought about it in the shower and on the drive to work. I’ve thought about it while considering design approaches and even while using it (in the form of Entity Framework). And it just feels fundamentally icky. I can’t shake the feeling.
Let me start out with a small example that everyone can probably agree on. Let’s say that you’re writing some kind of GUI application with a bunch of rather similar windows. And let’s say that mostly what you do is take all of presentation logic for the previous window, copy, paste and adjust to taste for the next window. Oh noes! We’re violating the DRY principle with all of that repetition, right?
What we should be doing instead, obviously, is writing a program that duplicates the code more quickly. That way you can crank out more windows much faster and without the periodic fat-fingering that was happening when you did it manually. Duplication problem solved, right? Er, well, no. Duplication problem automated and made worse. After all, the problem with duplicate code is a problem of maintenance more than initial push. The thing that hurts is later when something about all of that duplicated code has to be changed and you have to go find and do it everywhere. I think most reading would agree that code generation is a poor solution to the problem of copy and paste programming. The good solution is a design that eliminates repetition and duplication of knowledge.
I feel as though a lot of code generation that I see is a prohibitive micro-optimization. The problem is “I have to do a lot of repetitive coding” and code generation solves this problem by saying, “we’ll automate that coding for you.” I’d rather see it solved by saying, “let’s step back and figure out a better approach — one in which repetition is unnecessary.” The automation approach puts a band-aid on the wound and charges ahead, risking infection.
For instance, take the concept of List in C#. List is conceptually similar to an array, but it automatically resizes, thus abstracting away an annoying detail of managing collections in languages from days gone by. I’m writing a program and I think I want an IntList, which is a list of integers. That’s going along swimmingly until I realize that I need to store some averages in there that might not be round numbers, so I copy the source code IntList to DoubleList and I do a “Find-And-Replace” with Int and Double. Maybe later I also do that with string, and then I think, “geez — someone should write a program that you just tell it a type and it generates a list type for it.” Someone does, and then life is good. And then, later, someone comes along with the concept of generics/templates and everyone feels pretty sheepish about their “ListGenerator” programs. Why? Because someone actually solved the core problem instead of coming up with obtuse, brute-force ways to alleviate the symptoms.
And when you pull back and think about the whole idea of code generation, it’s fairly Rube-Goldbergian. Let’s write some code that writes code. It makes me think of some stoner ‘brainstorming’ a money making idea:
I realize that’s a touch of hyperbole, but think of what code generation involves. You’re going to feed code to a compiler and then run the compiled program which will generate code that you feed to the compiler, again, that will output a program. If you were to diagram that out with a flow chart and optimize it, what would you do? Would you get rid of the part where it went to the compiler twice and just write the program in the first place? (I should note that throughout this post I’ve been talking about this circular concept rather than, say, the way ASP or PHP generate HTML or the way Java compiles to bytecode — I’m talking about generating code at the same level of abstraction.)
The most obvious example I can think of is the aforementioned Entity Framework that I use right now. This is a framework utility that uses C# in conjunction with a markup language (T4) to generate text files that happen to be C# code. It does this because you have 100 tables in your database and you don’t want to write data transfer objects for all of them. So EF uses reflection and IQuerable with its EDMX to handle the querying aspect (which saves you from the fate we had for years of writing DAOs) while using code generation to give you OOP objects to represent your data tables. But really, isn’t this just another band-aid? Aren’t we really paying the price for not having a good solution to the Impedance Mismatch Problem?
I feel a whole host of code gen solutions is also born out of the desire to be more performant. We could write something that would look at a database table and generate, on the fly, using reflection, a CRUD form at runtime for that table. The performance would be poor, but we could do it. However, confronted with that performance, people often say, “if only there were a way to automate the stuff we want but to have the details sorted out at compile time rather than runtime.” At that point the battle is already won and the war already lost, because it’s only a matter of time until someone writes a program whose output is source code.
I’m not advocating a move away from code generating, nor am I impugning anyone for using it. This is post more in the same vein as ones that I’ve written before (about not using files for source code and avoiding using casts in object oriented languages). Code generation isn’t going anywhere anytime soon, and I know that I’m not even in a position to quit my reliance on it. I just think it’s time to recognize it as an inherently flawed band-aid rather than to celebrate it as a feat of engineering ingenuity.