Stories about Software


Intro to T4 Templates: Generating Text in a Hurry

Editorial Note: I originally wrote this post for the SubMain blog.  You can check out the original here, at their site.  While you’re there, take a look at GhostDoc.  In addition to comment generation capabilities, you can also effortlessly produce help documentation.

Today, I’d like to tackle a subject that inspires ambivalence in me.  Specifically, I mean the subject of automated text generation (including a common, specific flavor: code generation).

If you haven’t encountered this before, consider a common example.  When you file->new->(console) project, Visual Studio generates a Program.cs file.  This file contains standard includes, a program class, and a public static void method called “Main.”  Conceptually, you just triggered text (and code) generation.

Many schemes exist for doing this.  Really, you just need a templating scheme and some kind of processing engine to make it happen.  Think of ASP MVC, for instance.  You write markup sprinkled with interpreted variables (i.e. Razor), and your controller object processes that and spits out pure HTML to return as the response.  PHP and other server side scripting constructs operate this way and so do code/text generators.

However, I’d like to narrow the focus to a specific case: T4 templates.  You can use this powerful construct to generate all manner of text.  But use discretion, because you can also use this powerful construct to make a huge mess.  I wrote a post about the potential perils some years back, but suffice it to say that you should take care not to automate and speed up copy and paste programming.  Make sure your case for use makes sense.

The Very Basics

With the obligatory disclaimer out of the way, let’s get down to brass tacks.  I’ll offer a lightning fast getting started primer.

Open some kind of playpen project in Visual Studio, and add a new item.  You can find the item in question under the “General” heading as “Text Template.”


Give it a name.  For instance, I called mine “sample” while writing this post.  Once you do that, you will see it show up in the root directory of your project as Sample.tt.  Here is the text that it contains.

Save this file.  When you do so, Visual Studio will prompt you with a message about potentially harming your computer, so something must be happening behind the scenes, right?  Indeed, something has happened.  You have generated the output of the T4 generation process.  And you can see it by expanding the caret next to your Sample.tt file as shown here.


If you open the Sample.txt file, however, you will find it empty.  That’s because we haven’t done anything interesting yet.  Add a new line with the text “hello world” to the bottom of the Sample.tt file and then save.  (And feel free to get rid of that message about harming your computer by opting out, if you want).  You will now see a new Sample.txt file containing the words “hello world.”

Read More


Code Generation Seems Like a Failure of Vision

I think that I’m probably going to take a good bit of flack for this post, but you can’t win ’em all. I’m interested in contrary opinions and arguments because my mind could be changed. Nevertheless, I’ve been unable to shake the feeling for months that code generation is just a basic and fundamental design failure. I’ve tried. I’ve thought about it in the shower and on the drive to work. I’ve thought about it while considering design approaches and even while using it (in the form of Entity Framework). And it just feels fundamentally icky. I can’t shake the feeling.

Let me start out with a small example that everyone can probably agree on. Let’s say that you’re writing some kind of GUI application with a bunch of rather similar windows. And let’s say that mostly what you do is take all of presentation logic for the previous window, copy, paste and adjust to taste for the next window. Oh noes! We’re violating the DRY principle with all of that repetition, right?

What we should be doing instead, obviously, is writing a program that duplicates the code more quickly. That way you can crank out more windows much faster and without the periodic fat-fingering that was happening when you did it manually. Duplication problem solved, right? Er, well, no. Duplication problem automated and made worse. After all, the problem with duplicate code is a problem of maintenance more than initial push. The thing that hurts is later when something about all of that duplicated code has to be changed and you have to go find and do it everywhere. I think most reading would agree that code generation is a poor solution to the problem of copy and paste programming. The good solution is a design that eliminates repetition and duplication of knowledge.

I feel as though a lot of code generation that I see is a prohibitive micro-optimization. The problem is “I have to do a lot of repetitive coding” and code generation solves this problem by saying, “we’ll automate that coding for you.” I’d rather see it solved by saying, “let’s step back and figure out a better approach — one in which repetition is unnecessary.” The automation approach puts a band-aid on the wound and charges ahead, risking infection.

For instance, take the concept of List in C#. List is conceptually similar to an array, but it automatically resizes, thus abstracting away an annoying detail of managing collections in languages from days gone by. I’m writing a program and I think I want an IntList, which is a list of integers. That’s going along swimmingly until I realize that I need to store some averages in there that might not be round numbers, so I copy the source code IntList to DoubleList and I do a “Find-And-Replace” with Int and Double. Maybe later I also do that with string, and then I think, “geez — someone should write a program that you just tell it a type and it generates a list type for it.” Someone does, and then life is good. And then, later, someone comes along with the concept of generics/templates and everyone feels pretty sheepish about their “ListGenerator” programs. Why? Because someone actually solved the core problem instead of coming up with obtuse, brute-force ways to alleviate the symptoms.

And when you pull back and think about the whole idea of code generation, it’s fairly Rube-Goldbergian. Let’s write some code that writes code. It makes me think of some stoner ‘brainstorming’ a money making idea:

Inve ntions

I realize that’s a touch of hyperbole, but think of what code generation involves. You’re going to feed code to a compiler and then run the compiled program which will generate code that you feed to the compiler, again, that will output a program. If you were to diagram that out with a flow chart and optimize it, what would you do? Would you get rid of the part where it went to the compiler twice and just write the program in the first place? (I should note that throughout this post I’ve been talking about this circular concept rather than, say, the way ASP or PHP generate HTML or the way Java compiles to bytecode — I’m talking about generating code at the same level of abstraction.)

The most obvious example I can think of is the aforementioned Entity Framework that I use right now. This is a framework utility that uses C# in conjunction with a markup language (T4) to generate text files that happen to be C# code. It does this because you have 100 tables in your database and you don’t want to write data transfer objects for all of them. So EF uses reflection and IQuerable with its EDMX to handle the querying aspect (which saves you from the fate we had for years of writing DAOs) while using code generation to give you OOP objects to represent your data tables. But really, isn’t this just another band-aid? Aren’t we really paying the price for not having a good solution to the Impedance Mismatch Problem?

I feel a whole host of code gen solutions is also born out of the desire to be more performant. We could write something that would look at a database table and generate, on the fly, using reflection, a CRUD form at runtime for that table. The performance would be poor, but we could do it. However, confronted with that performance, people often say, “if only there were a way to automate the stuff we want but to have the details sorted out at compile time rather than runtime.” At that point the battle is already won and the war already lost, because it’s only a matter of time until someone writes a program whose output is source code.

I’m not advocating a move away from code generating, nor am I impugning anyone for using it. This is post more in the same vein as ones that I’ve written before (about not using files for source code and avoiding using casts in object oriented languages). Code generation isn’t going anywhere anytime soon, and I know that I’m not even in a position to quit my reliance on it. I just think it’s time to recognize it as an inherently flawed band-aid rather than to celebrate it as a feat of engineering ingenuity.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.