DaedTech

Stories about Software

By

Fun with ILDASM: Extension vs Static Methods

I recently had a comment on a very old blog post that led to a conversation between the commenter and me. During the conversation, I made the comment that extension methods in C# are just syntactic sugar over plain ol’ static methods. He asked me to do a post exploring this in further detail, which is what I’m doing today. However, there isn’t a ton more to explain in this regard anymore than explaining that (2 + 2) and 4 are representations of the same concept. So, I thought I’d “show, don’t tell” and, in doing so, introduce you to the useful concept of using a disassembler to allow you to examine .NET’s IL byte code.

I won’t go into a ton of detail about virtual machines and just in time compilation or anything, but I will describe just enough for the purpose of this post. When you write C# code and perform a build, the compiler turns this into what’s called “IL” (intermediate language). Java works in the same way, and its intermediate product is generally referred to as byte code. When an IL executable is executed, the .NET framework is running, and this intermediate code is compiled, on the fly, into machine code. So what you have with both .NET in Java is a 2 stage compilation process: source code to intermediate code, and intermediate code to machine code.

Where things get interesting for our purposes is that the disassembled intermediate code is legible and you can work with it, reading it or even writing it directly (which is how AOP IL-Weaving tools like Postsharp work their magic). What gets even more interesting is that all of the .NET languages (C#, VB, F#, etc) compile to the same IL code and are treated the same by the framework when compiled into machine code. This is what is meant by languages “targeting the .NET framework” — there is a compiler that resolves these languages into .NET IL. If you’ve ever wondered why it’s possible to have things like “Iron Python” that can exist in the .NET ecosystem, this is why. Someone has written code that will parse Python source code and generate .NET IL (you’ll also see this idea referred to as “Common Language Infrastructure” or CLI).

Anyway, what better way to look at the differences or lack thereof between static methods and extension methods. Let’s write them and see what the IL looks like! But, in order to do that, we need to do a little prep work first. We’re going to need easy access to a tool that can read .NET exe and dll files and produce the assembly in a readable, text-file form. So, here’s what we’re going to do.

  1. In Visual Studio, go to Tools->External Tools.
  2. Click “Add” and you will be prompted to fill out the text boxes below.
  3. Fill them out as shown here (you may have to search for ILDASM.exe, but it should be in Microsoft SDKs under Program Files (x86):ILDASM
  4. Click “Apply.”  ILDASM will now appear as a menu option in the Tools menu.

Now, let’s get to work.  I’m going to create a new project that’s as simple as can be. It’s a class library with one class called “Adder.” Here’s the code:

Let no one accuse me of code bloat! That’s it. That’s the only class/method in the solution. So, let’s run ILDASM on it and see what happens. To do that, select “ILDASM” from the Tools menu, and it will launch a window with nothing in it. Go to “File->Open” (or Ctrl-O) and it will launch you in your project’s output directory. (This is why I had you add “$(TargetDir)” in the external tools window. Click the DLL, and you’ll be treated to a hierarchical makeup of your assembly, as shown here:

ILDASMPopulated

So, let’s see what the method looks like in IL Code (just double click it):

Alright… thinking back to my days using assembly language, this looks vaguely familiar. Load the arguments into registers or something, add them, you get the gist.

So, let’s see what happens when we change the source code to use an extension method. Here’s the new code:

Note, the only difference is the addition of “this” before “int first.” That is what turns this into an extension method and alters the calling semantics (though you can still call extension methods the same way you would normal static methods).

So, let’s see what the IL code looks like for that:

The only difference between this and the plain static version is the presence of the line:

The “this” keyword results in the generation of this attribute, and its purpose is to allow the compiler to flag it as an extension method. (For more on this, see this old post from Scott Hanselman: How do Extension Methods work and why was a new CLR not required?). The actual substance of the method is completely identical.

So, there you have it. As far as the compiler is concerned, the difference between static and extension methods is “extension methods are static methods with an extension method attribute.” Now, I could go into my opinion on which should be used, when, and how, but it would be just that: my opinion. And your mileage and opinion may vary. The fact of the matter is that, from the compiler’s perspective, they’re the same, so when and how you use one versus the other is really just a matter of your team’s preferences, comfort levels, and ideas about readability.

By

How To Put Your Favorite Source Code Goodies on Nuget

A while back, I made a post encouraging people to get fed up every now and then and figure out a better way of doing something. Well, tonight I take my own advice. I am sick and tired of rifling through old projects to find code that I copy and paste into literally every non-trivial .NET solution that I create. There’s a thing for this, and it’s called Nuget. I use it all the time to consume other people’s code, libraries and utilities, but not my own. Nope, for my own, I copy and paste stuff from other projects. Not anymore. This ends now.

My mission tonight is to take a simple bit of code that I add to all my unit test projects and to make it publicly available view Nuget. Below is the code. Pretty straightforward and unremarkable. For about 5 versions of MSTest, I’ve hated the “ExpectedException” attribute for testing that something throws an exception. It’s imprecise. All it tests is that somewhere, anywhere, in the course of execution, an exception of the type in question is thrown. Could be on the first line of the method, could be on the last, could happen in the middle from something nested 8 calls deep in the call stack. Who knows? Well, I want to know and be precise, so here’s what I do instead:

Now, let’s put this on Nuget somehow. I found my way to this link, with instructions. Having no idea what I’m doing (though I did play with this once, maybe a year and a half ago), I’m going with the GUI option even though there’s also a command line option. So, I downloaded the installer and installed the Nuget package explorer.

From there, I followed the link’s instructions, more or less. I edited the package meta data to include version info, ID, author info, and a description. Then, I started to play around with the “Framework Assemblies” section, but abandoned that after a moment. Instead, I went up to Content->Add->Existing file and added ExtendedAssert. Once I saw the source code pop up, I was pretty content (sorry about the little Grindstone timer in the screenshot — didn’t notice ’til it was too late):

PackageExplorer

Next up, I ran Tools->Analyze Package. No issues found. Not too shabby for someone with no idea what he’s doing! Now, to go for the gusto — let’s publish this sucker. File->Publish and, drumroll please…. ruh roh. I need something called a “Publish Key” to publish it to nuget.org.

PublishKey

But, as it turns out, getting an API key is simple. Just sign up at nuget.org and you get one. I used my Microsoft account to sign up. I uploaded my DaedTech logo for the profile picture and tweaked a few settings and got my very own API key (found by clicking on my account name under the “search packages” text box at the top). There was even a little clipboard logo next to it for handy copying, and I copied it into the window shown above, and, viola! After about 20 seconds, the publish was successful. I’d show you a screenshot, but I’m not sure if I’m supposed to keep the API key a secret. Better safe than sorry. Actually, belay that last thought — you are supposed to keep it a secret. If you click on “More Info” under your API key, it says, and I quote:

Your API key provides you with a token that identifies you to the gallery. Keep this a secret. You can always regenerate your key at any time (invalidating previous keys) if your token is accidentally revealed.

Emphasis mine — turns out my instinct was right. And, sorry for the freewheeling nature of this post, but I’m literally figuring this stuff out as I type, and I thought it might make for an interesting read to see how someone else pokes around at this kind of experimenting.

Okay, now to see if I can actually get that thing. I’m going to create a brand new test project in Visual Studio and see if I can install my beloved ExtendedAssert through Nuget, now.

NugetSuccess

Holy crap, awesome! I’m famous! (Actually, that was so easy that I kind of feel guilty — I thought it’d be some kind of battle, like publishing a phone app or something). But, the moment of truth was a little less exciting. I installed the package, and it really didn’t do anything. My source code file didn’t appear. Hmmm…

After a bit of googling, I found this stack overflow question. Let’s give that a try, optimistically upvoting the question and accepted answer before I forget. I right clicked in the “package contents” window, added a content folder, and then dragged ExtendedAssert into that folder. In order to re-publish, I had to rev the version number, so I revved the patch decimal, since this is a hot patch to cover an embarrassing release if I’ve ever seen one. No time for testing on my machine or a staging environment — let’s slam this baby right into production!

Woohoo! It worked and compiled! Check it out:

NugetInstallSuccess

But, there’s still one sort of embarrassing problem — V1.0.1 has the namespace from whichever project I picked rather than the default namespace for the assembly. That’s kind of awkward. Let’s go back to google and see about tidying that up. First hit was promising. I’m going to try replacing the namespace with a “source code transformation” as shown here:

s

Then, according to the link, I also need to change the filename to ExtendedAssert.cs.pp (this took me another publish to figure out that I won’t bore you with). Let’s rev again and go into production. Jackpot! Don’t believe me? Go grab it yourself.

The Lessons Here

A few things I’ll note at this point. First off, I recall that it’s possible to save these packages locally and for me to try them before I push to Nuget. I should definitely have done that, so there’s a meta-lesson here in that I fell into the classic newbie trap of thinking “oh, this is simple and it’ll just work, so I’ll push it to the server.” I’m three patches in and it’s finally working. Glad I don’t have tens of thousands of users for this thing.

But the biggest thing to take away from this is that Nuget is really easy. I had no idea what I was doing and within an hour I had a package up. For the last 5 years or so, every time I start a new project, I’d shuffle around on the machine to find another ExtendedAssert.cs that I could copy into the new project. If it’s a new machine, I’d email it to myself. A new job? Have a coworker at the old one email it to me. Sheesh, barbaric. And I put up with it for years, but not anymore. Given how simple this is, I’m going to start making little Nuget packages for all my miscellaneous source code goodies that I transport with me from project to project. I encourage you to do the same.

By

Creating a Word Document from Code with Spire

I’d like to tell you a harrowing, cautionary tale of my experience with the MS Office Interop libraries and then turn it into a story of redemption. Just to set the stage, these interop libraries are basically a way of programatically creating and modifying MS Office files such as Word documents and Excel spreadsheets. The intended usage of these libraries is in a desktop environment from a given user account in the user space. The reason for this is that what they actually do is launch MS Word and start piping commands to it, telling it what to do to the current document. This legacy approach works reasonably well, albeit pretty awkwardly from a user account. But what happens when you want to go from a legacy Winforms app to a legacy Webforms app and do this on a web server?

Microsoft has the following to say:

Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment.

Microsoft says, “yikes, don’t do that, and if you do, caveat emptor.” And, that makes sense. It’s not a great idea to allow service processes to communicate directly with Office documents anyway because of the ability to embed executable code in them.

Sometime back, I inherited an ecosystem of legacy Winforms and Webforms applications and one common thread was the use of these Interop libraries in both places. Presumably, the original author wasn’t aware of Microsoft’s stance on this topic and had gone ahead with using Interop on the web server, getting it working for the moment. I didn’t touch this legacy code since it wasn’t causing any issues, but one day a server update came down the pipeline and *poof* no more functioning Interop. This functionality was fairly important to people, so my team was left to do some scrambling to re-implement the functionality using PDF instead of MS Word. It was all good after a few weeks, but it was a stressful few weeks and I developed battle scars around not only doing things with those Interop libraries and their clunky API (see below) but with automating anything with Office at all. Use SSRS or generate a PDF or something. Anything but Word!

Interop

But recently I was contacted by E-iceblue, who makes document management and conversion software in the .NET space. They asked if I’d take a look at their offering and write-up my thoughts on it. I agreed, as I do agree to requests like this from time to time, but always with the caveat that I’ll write about my experience in earnest and not serve as a platform for a print-based commercial. Given my Interop horror story, the first thing I asked was whether the libraries could work on a server or not, and I was told that they could (presumably, although I haven’t verified, this is because they use the Open XML format rather than the legacy Interop paradigm). So, that was already a win.

I put this request in my back pocket for a bit because I’m already pretty back-logged with post requests and other assorted work, but I wound up having a great chance to try it out. I have a project up on Github that I’ve been pushing code to in order to help with my Pluralsight course authorship. Gist of it is that I create a directory and file structure for my courses as I work on them, and then another for submission, and I want to automate the busy-work. And one thing that I do for every module of every course is create a power point document and a word document for scripting. So, serendipity — I had a reason to generate word documents and thus to try out E-iceblue’s product, Spire.

Rather than a long post with screenshots and all of that, I did a video capture. I want to stress that what you’re seeing here is me, having no prior knowledge of the product at all, and armed only with a link to tutorials that my contact there sent to me. Take a look at the video and see what it’s like. I spend about 4-5 minutes getting setup and, at the end of it, I’m using a nice, clean API to successfully generate a Word document.

I’ll probably have some more posts in the hopper with this as I start doing more things with it (Power Point, more complex Word interaction, conversion, etc). Early returns on this suggest it’s worth checking out, and, as you can see, the barriers to entry are quite low, and I’ve barely scratched the surface of just one line of offerings.

By

Introduction to Static Analysis (A Teaser for NDepend)

Rather than the traditional lecture approach of providing an official definition and then discussing the subject in more detail, I’m going to show you what static analysis is and then define it. Take a look at the following code and think for a second about what you see. What’s going to happen when we run this code?

Well, let’s take a look:

Exception

I bet you saw this coming. In a program that does nothing but set x to 1, and then throw an exception if x is 1, it isn’t hard to figure out that the result of running it will be an unhandled exception. What you just did there was static analysis.

Static analysis comes in many shapes and sizes. When you simply inspect your code and reason about what it will do, you are performing static analysis. When you submit your code to a peer to have her review, she does the same thing. Like you and your peer, compilers perform static analysis, though automated analysis instead of manual. They check the code for syntax errors or linking errors that would guarantee failures, and they will also provide warnings about potential problems such as unreachable code or assignment instead of evaluation. Products also exist that will check your source code for certain characteristics and stylistic guideline conformance rather than worrying about what happens at runtime and, in managed languages, products exist that will analyze your compiled IL or byte code and check for certain characteristics. The common thread here is that all of these examples of static analysis involve analyzing your code without actually executing it.

Analysis vs Reactionary Inspection

People’s interactions with their code tend to gravitate away from analysis. Whether it’s unit tests and TDD, integration tests, or simply running the application to see what happens, programmers tend to run experiments with their code and then to see what happens. This is known as a feedback loop, and programmers use the feedback to guide what they’re going to do next. While obviously some thought is given to what impact changes to the code will have, the natural tendency is to adopt an “I’ll believe it when I see it” mentality.

We tend to ask “what happened?” and we tend to orient our code in such ways as to give ourselves answers to that question. In this code sample, if we want to know what happened, we execute the program and see what prints. This is the opposite of static analysis in that nobody is trying to reason about what will happen ahead of time, but rather the goal is to do it, see what the outcome is, and then react as needed to continue.

Reactionary inspection comes in a variety of forms, such as debugging, examining log files, observing the behavior of a GUI, etc.

Static vs Dynamic Analysis

The conclusions and decisions that arise from the reactionary inspection question of “what happened” are known as dynamic analysis. Dynamic analysis is, more formally, inspection of the behavior of a running system. This means that it is an analysis of characteristics of the program that include things like how much memory it consumes, how reliably it runs, how much data it pulls from the database, and generally whether it correctly satisfies the requirements are not.

Assuming that static analysis of a system is taking place at all, dynamic analysis takes over where static analysis is not sufficient. This includes situations where unpredictable externalities such as user inputs or hardware interrupts are involved. It also involves situations where static analysis is simply not computationally feasible, such as in any system of real complexity.

As a result, the interplay between static analysis and dynamic analysis tends to be that static analysis is a first line of defense designed to catch obvious problems early. Besides that, it also functions as a canary in the mine to detect so-called “code smells.” A code smell is a piece of code that is often, but not necessarily, indicative of a problem. Static analysis can thus be used as an early detection system for obvious or likely problems, and dynamic analysis has to be sufficient for the rest.

Canary

Source Code Parsing vs. Compile-Time Analysis

As I alluded to in the “static analysis in broad terms” section, not all static analysis is created equal. There are types of static analysis that rely on simple inspection of the source code. These include the manual source code analysis techniques such as reasoning about your own code or doing code review activities. They also include tools such as StyleCop that simply parse the source code and make simple assertions about it to provide feedback. For instance, it might read a code file containing the word “class” and see that the next word after it is not capitalized and return a warning that class names should be capitalized.

This stands in contrast to what I’ll call compile time analysis. The difference is that this form of analysis requires an encyclopedic understanding of how the compiler behaves or else the ability to analyze the compiled product. This set of options obviously includes the compiler which will fail on show stopper problems and generate helpful warning information as well. It also includes enhanced rules engines that understand the rules of the compiler and can use this to infer a larger set of warnings and potential problems than those that come out of the box with the compiler. Beyond that is a set of IDE plugins that perform asynchronous compilation and offer realtime feedback about possible problems. Examples of this in the .NET world include Resharper and CodeRush. And finally, there are analysis tools that look at the compiled assembly outputs and give feedback based on them. NDepend is an example of this, though it includes other approaches mentioned here as well.

The important compare-contrast point to understand here is that source analysis is easier to understand conceptually and generally faster while compile-time analysis is more resource intensive and generally more thorough.

The Types of Static Analysis

So far I’ve compared static analysis to dynamic and ex post facto analysis and I’ve compared mechanisms for how static analysis is conducted. Let’s now take a look at some different kinds of static analysis from the perspective of their goals. This list is not necessarily exhaustive, but rather a general categorization of the different types of static analysis with which I’ve worked.

  • Style checking is examining source code to see if it conforms to cosmetic code standards
  • Best Practices checking is examining the code to see if it conforms to commonly accepted coding practices. This might include things like not using goto statements or not having empty catch blocks
  • Contract programming is the enforcement of preconditions, invariants and postconditions
  • Issue/Bug alert is static analysis designed to detect likely mistakes or error conditions
  • Verification is an attempt to prove that the program is behaving according to specifications
  • Fact finding is analysis that lets you retrieve statistical information about your application’s code and architecture

There are many tools out there that provide functionality for one or more of these, but NDepend provides perhaps the most comprehensive support across the board for different static analysis goals of any .NET tool out there. You will thus get to see in-depth examples of many of these, particularly the fact finding and issue alerting types of analysis.

A Quick Overview of Some Example Metrics

Up to this point, I’ve talked a lot in generalities, so let’s look at some actual examples of things that you might learn from static analysis about your code base. The actual questions you could ask and answer are pretty much endless, so this is intended just to give you a sample of what you can know.

  • Is every class and method in the code base in Pascal case?
  • Are there any potential null dereferences of parameters in the code?
  • Are there instances of copy and paste programming?
  • What is the average number of lines of code per class? Per method?
  • How loosely or tightly coupled is the architecture?
  • What classes would be the most risky to change?

Believe it or not, it is quite possible to answer all of these questions without compiling or manually inspecting your code in time consuming fashion. There are plenty of tools out there that can offer answers to some questions like this that you might have, but in my experience, none can answer as many, in as much depth, and with as much customizability as NDepend.

Why Do This?

So all that being said, is this worth doing? Why should you watch the subsequent modules if you aren’t convinced that this is something that’s even worth learning. It’s a valid concern, but I assure you that it is most definitely worth doing.

  • The later you find an issue, typically, the more expensive it is to fix. Catching a mistake seconds after you make it, as with a typo, is as cheap as it gets. Having QA catch it a few weeks after the fact means that you have to remember what was going on, find it in the debugger, and then figure out how to fix it, which means more time and cost. Fixing an issue that’s blowing up in production costs time and effort, but also business and reputation. So anything that exposes issues earlier saves the business money, and static analysis is all about helping you find issues, or at least potential issues, as early as possible.
  • But beyond just allowing you to catch mistakes earlier, static analysis actually reduces the number of mistakes that happen in the first place. The reason for this is that static analysis helps developers discover mistakes right after making them, which reinforces cause and effect a lot better. The end result? They learn faster not to make the mistakes they’d been making, causing fewer errors overall.
  • Another important benefit is that maintenance of code becomes easier. By alerting you to the presence of “code smells,” static analysis tools are giving you feedback as to which areas of your code are difficult to maintain, brittle, and generally problematic. With this information laid bare and easily accessible, developers naturally learn to avoid writing code that is hard to maintain.
  • Exploratory static analysis turns out to be a pretty good way to learn about a code base as well. Instead of the typical approach of opening the code base in an IDE and poking around or stepping through it, developers can approach the code base instead by saying “show me the most heavily used classes and which classes use them.” Some tools also provide visual representations of the flow of an application and its dependencies, further reducing the learning curve developers face with a large code base.
  • And a final and important benefit is that static analysis improves developers’ skills and makes them better at their craft. Developers don’t just learn to avoid mistakes, as I mentioned in the mistake reduction bullet point, but they also learn which coding practices are generally considered good ideas by the industry at large and which practices are not. The compiler will tell you that things are illegal and warn you that others are probably errors, but static analysis tools often answer the question “is this a good idea.” Over time, developers start to understand subtle nuances of software engineering.

There are a couple of criticisms of static analysis. The main ones are that the tools can be expensive and that they can create a lot of “noise” or “false positives.” The former is a problem for obvious reasons and the latter can have the effect of counteracting the time savings by forcing developers to weed through non-issues in order to find real ones. However, good static analysis tools mitigate the false positives in various ways, an important one being to allow the shutting off of warnings and the customization of what information you receive. NDepend turns out to mitigate both: it is highly customizable and not very expensive.

Reference

The contents of this post were mostly taken from a Pluralsight course I did on static analysis with NDepend. Here is a link to that course. If you’re not a Pluralsight subscriber but are interested in taking a look at the course or at the library in general, send me an email to erik at daedtech and I can give you a 7 day trial subscription.

By

Merging Done Right: Semantic Merge

There are few things in software development as surprisingly political as merging your code when there are conflicts. Your first reaction to this is probably to think that I’m crazy, but seriously, think about what happens when your diff tool/source control combo tells you there’s a conflict. You peer at the conflict for a moment and then think, “alright, who did this?” Was it Jim? Well, Jim’s kind of annoying and pretty new, so it’s probably fine just to blow his changes away and send him an email telling him to get the latest code and re-do his stuff. Oh, wait, no, it looks like it was Janet. Uh oh. She’s pretty sharp and a Principal so you’ll probably be both wrong and in trouble if you mess this up — better just revert your changes, get hers and rework your stuff. Oh, on third look, it appears that it was Steve, and since Steve is your buddy, you’ll just go grab him and work through the conflict together.

Notice how none of this has anything to do with what the code should actually look like?

Now, I’ll grant that this isn’t always the case; there are times when you can figure out what should be in the master copy of the source control, but it’s pretty likely that you’ve sat staring at merge conflicts and thinking about people and not code. Why is that? Well, frankly because merge tools aren’t very good at telling you the story of what’s happened, and that’s why you need a human to come tell you the story. But which human, which story, and how interested you are in that interaction are all squarely the stuff of group dynamics and internal politics. Hopefully you get on well with your team and they’re happy to tell you the story.

But what if your tools could tell you that story? What if, instead of saying, “Jim made text different on lines, 100, 124, 135-198, 220, 222-228,” your tooling said, “Jim moved a method, and deleted a few references to a field whereas you edited the method that he moved?” Holy crap! You wouldn’t need to get Jim at all because you could just say, “oh, okay, I’ll do a merge where we do all of his stuff and then make my changes to that method he moved.”

I’ve been poking around with Roslyn and reading about it lately, and this led me to Semantic Merge. This is a diff tool that uses Roslyn, which means that it’s parsing your code into a syntax tree and actually reasoning about it as code, rather than text (or text with heuristics). As such, it’s no mirage or trickery that it can say things like “oh, Jim moved the method but left it intact whereas you made some changes to it.” It makes perfect sense that it can do this.

Let’s take a look at this in action. I’m only showing you the tiniest hint of what’s possible, but I’d like to pick out a very simple example of where a traditional merge tool kind of chokes and Semantic Merge shines. It is, after all, a pay to play (although pretty affordable) tool, so a compelling case should be made.

The Old Way

Before you see how cool Semantic Merge is, let’s take a look at a typical diff scenario. I’ll do this using the Visual Studio compare tool that I use on a day to day basis. And I’m calling this “the old way,” in spite of the fact that I fell in love with this as compared to the way it used to be in VS2010 and earlier. It’s actually pretty nice as far as diff tools go. I’m going to take a class and make a series of changes to it. Here’s the before:

Now, what I’m going to do is swap the positions of PrintNumbers() and ChangeTheWord(), add some error checking to ChangeTheWord() and delete the comments above the constructor. Here’s the after:

If I now want to compare these two files using the diff tool, here’s what I’m looking at:

StandardDiff

This is the point where I groan and mutter to myself because it annoys me that the tool is comparing the methods side by side as if I renamed one and completely altered its contents entirely. I’m sure you can empathize. You’re muttering to yourself too and what you’re saying is, “you idiot tool, it’s obviously a completely different method.” Well, here’s the same thing as summarized by Semantic Merge:

SemanticMergeDiff

It shows me that there are two types of differences here: moves and changes. I’ve moved the two methods PrintNumbers() and ChangeTheWord() and I’ve changed the constructor of the class (removing comments) and the ChangeTheWord() method. Pretty awesome, huh? Rather than a bunch of screenshots to show you the rest, however, however, I’ll show you this quick clip of me playing around with it.

Some very cool stuff in there. First of all, I started where the screenshot left off — with a nice, succinct summary of what’s changed. From there you can see that it’s easy to flip back and forth between the methods, even when moved, to see how they’re different. You can view each version of the source as well as a quick diff only of the relevant, apples-to-apples, changes. It’s also nice, in general, that you can observe the changes according to what kind of change they are (modification, move, etc). And finally, at the end, I played a bit with the more traditional diff view that you’re used to — side by side text comparison. But even with that, helpful UI context shows you that things have moved rather than the screenshot of the VS merge tool above where it looks like you’ve just butchered two different methods.

This is only scratching the surface of Semantic Merge. There are more features I haven’t covered at all, including a killer feature that helps auto-resolve conflicts by taking the base version of the code as well as server and local in order to figure out if there are changes only really made by one person. You can check more of it out in this extended video about the tool. As I’ve said, it’s a pay tool, but the cost isn’t much and there’s a 30 day trial, so I’d definitely advise taking it for a spin if you work in a group and find yourself doing any merging at all.

By

Getting Started on the Roslyn Journey

It’s not as though it’s new; Roslyn CTP was announced in the fall of 2011, and people have been able to play with it since then. Roslyn is a quietly ground-breaking concept — a set of compilers that exposes compiling, code modeling, refactoring, and analysis APIs. Oh, and it was recently announced that the tool would be open source meaning that all of you Monday morning quarterback language authors out there can take a crack at implementing multiple inheritance or whatever other language horrors you have in mind.

I have to say that I, personally, have little interest in modifying any of the language compilers (unless I went to work on a language team, which would actually be a blast, I think), but I’m very interested in the project itself. This strikes me as such an incredible, ground-breaking concept and I think a lot of people are just kind of looking at this as a curiosity for real language nerds and Microsoft fanboys. The essential value in this offering, to me, is the standardizing of code as data. I’ve written about this once before, and I think that gets lost in the shuffle when there’s talk about emitting IL at runtime and infinite loops of code generation and whatnot. Forget the idea of dispatching a service call to turn blobs of text into executables at runtime and let’s agree later to talk instead about the transformative notion of regarding source code as entity collections rather than instruction sheets, scripts, or recipes.

But first, let’s get going with Roslyn. I’m going to assume you’ve never heard of this before and I’m going to take you from that state of affairs to doing something interesting with it in this post. In subsequent/later posts, we’ll dive back into what I’m driving at philosophically in the intro to this post about code as data.

Getting Started

(Note — I have VS2013 on all my machines and that is what I’ve used. I don’t know whether any/all of this would work in Studio 2012 or earlier, so buyer beware)

First things first. In order to use the latest Roslyn bits, you actually need a fairly recent version of Nuget. This caught me off guard, so hopefully I’ll save you some research and digging. Go to “Tools” menu and choose “Extensions and Updates.” Click on the “Updates” section at the left, and then click on “Visual Studio Gallery.”

NugetUpgrade

If you’re like me, your version was 2.7.something and it needs to be 2.8.1something or higher. This update will get you where you need to be. Once you’ve done that, you can simply install the API libraries via Nuget command line.

With that done, you’re ready to download the necessary installation files from Microsoft. Go to http://aka.ms/roslyn to get started. If you’re not signed in, you’ll be prompted to sign in with your Microsoft ID (you’ll need to create one if you don’t have one) and then fill out a survey. If you get lost along the way, your ultimate destination is to wind up here.

At this point, if you follow the beaten path and click the “Download” button, you’ll get something called download.dlm that, if your environment is like mine, is completely useless. So don’t do that. Click the circled “download” link indicated below to get the actual Roslyn SDK.

DownloadRoslyn

Once that downloads, unpack the zip file and run “Roslyn End User Preview” to install Roslyn language features. Now you can access the APIs and try out interesting new language features, like this one:

That’s all well and good for dog-fooding IDE changes and previewing new language features, but if you want access to the coolness from an API perspective, it’s time to fire up Nuget. Open up a project, and then the Nuget command line and type “Install-Package Microsoft.CodeAnalysis -Pre”

Once that finishes up, make your main entry point consist of the following code:

At this point, if you hit F5, what you’re going to see on the screen is a list of the fields contained in the class that you specify as your “sourceCodePath” variable (at least you will with the happy path — I haven’t tested this extensively to see if I can write classes that break it). Now, could you simply write a text parser (or, God forbid, some kind of horrible regex) to do this? Sure. Are there C# language modeling utilities like a Code DOM that would let you do this? Sure. Are any of these things the C# compiler? Nope. Just this.

So think about what this means. You’re not writing a utility that uses a popular C# source code modeling abstraction; you’re writing a utility that says, “hey, compiler, what are the fields in this source code?” And that’s pretty awesome.

My purpose here was to give you a path from “what’s this Roslyn thing anyway” to “wow, look at that, I can write a query against my own code.” Hopefully you’ve gotten that out of this, and hopefully you’ll go forth, tinker, and then you can come back and show me some cool tricks.

By

NCrunch and Continuous Testing: The Must-Have Setup

Most of this post was taken from the transcript of my Pluralsight course on NCrunch. If you are interested in watching the course but are not a Pluralsight subscriber, feel free to email me or leave a comment requesting a trial, and I’ll get you a 7 day subscription to check it out.

Understanding the Legitimate, Root-Cause Objection to TDD

In my experience, there are three basic “camps” of reactions to the concept of test driven development (TDD) from those not experienced with it: willing students, healthy skeptics and reactionary curmudgeons. The first group is basically looking for a chance to practice and needs no convincing. The last group will have to be dragged along, kicking and screaming, and so there’s no persuading them without the threat of negative consequences. It is the middle group that tends to have rational objections, some of which are well-founded and others of which aren’t so much. A lot of the negative reaction from this group is the result of reacting to the misconceptions that I mentioned in this post about what TDD is and isn’t. But even once they understand how it works, there are still some fairly common and legitimate objections that are not simply straw man arguments.

  1. The most common and prevalent objection is that coding this way means that you’re doing a lot more work. You’re taking more time and writing more code and people don’t necessarily see the benefit, especially in cases where they already know what code they want to write.
  2. Many of the misconception objections and other inexplicable resistance is really the result of people simply not knowing how to write tests or practice TDD, and perhaps at times being reluctant to admit it. Others may freely admit it. Either way, the objection is that TDD, like any other discipline, would take time to learn and require an investment of effort.
  3. There is also more code that is going into a project since you now have an additional test class for each single class you would otherwise have created. More code means more maintenance time and effort.
  4. Many astute observers also realize that a lot of legacy code, particularly that involving large-work constructors, singletons, and static state is very hard to test, making attempts to do so effort-intensive.
  5. And, along the same lines , they also realize that there would be more effort required than simply learning how to do TDD – it would also mean learning different design techniques such as dependency injection, polymorphism, and inversion of control.

When you consider all of these objections, they all have a common thread. At the core of it, they’re really all variants on the theme of not having enough time. Writing the tests, maintaining the test code, learning new ways of doing things, and applying them to new and old code are all things that take time, and for most developers, time is precious. Someone selling TDD is a lot like someone selling you on a 401K: they’re convincing you that sacrificing now is going to be worth it later and asking you to take this, to some degree, on faith.

Could TDD be better?

Justifying the adoption of TDD to a healthy skeptic hinges largely on demonstrating that it provides a net benefit in terms of time, and thus cost. So how can these objections be reconciled and the concerns addressed?

Well first up are the learning curve oriented objections. And the truth is that there’s no way around this one being a time sink. Learning how to do TDD and learning how to write testable code are going to take time, no matter what. If you do not have the time to learn, this is a perfectly valid objection, but only in the shorter term. After all, we work in an industry where change is the only constant and learning new languages, frameworks, and methodologies is pretty much table stakes for staying relevant.

Regarding development time overall, a very common argument made by TDD proponents is that the practice saves time over the long haul. This is reminiscent of the parable of the tortoise and the hare where the TDD practitioner is a tortoise plodding along, getting everything right and the hare is generating reams of code quickly but with mistakes. The hare will declare himself done more quickly, but he’ll spend a lot more time later troubleshooting, reading log files, debugging, and fixing errors. The tortoise may not finish as quickly, but when he does, he truly is done.

But what about in the short term? Is there anything that can be done to make things go more quickly in the short term for TDD practitioners? Could we strap a rocket pack to the tortoise and make him go faster than the hare while preserving his accuracy?

RocketTurtle

Speeding up the Feedback Loop

What if I told you a story? What if I told you that you could write code and know whether or not it was working nearly instantaneously? In this world of development, you don’t have to wait while your application starts up, and then navigate through various user interface screens to get to the action that will trigger the bit of code you want to verify. There is no more repetitive clicking and typing and waiting for screens to load. In fact, in this world you don’t even need to build your project or compile your code. All you need to do is type and see, as you’re typing, whether or not the changes you’re making are right. And, you can see a visual metric for how much confidence you can have in your changes by virtue of how much your code is covered by the unit tests.
Does that story sound too good to be true? Well, I’ll admit that it does sound pretty good, but I’ll let you in on a little secret – it is true. There is a name for this paradigm, and it’s called “continuous testing.” And there are various tools out there for different platforms that make it a reality, right now as we speak.

To understand the magic of continuous testing, it’s essential to understand one of the most important, but often overlooked, concepts in computer science. I’m talking about the feedback loop. At its core, programming is a series of experiments. Whenever you approach a programming task, you have a code base that does something, and you have a goal to make it do something different or new. To achieve this goal, you identify intermediate behaviors that you’d like to see to mark progress, and then you make changes that you think will result in those behaviors. Then, you run the application to see if what you thought would happen does, in fact, happen.

For example, perhaps you want to have your application display customer information stored in a database to the screen when the user clicks a certain button. You might first say “forget the database – let’s just get the button click to result in some hard-coded value being displayed,” and then set about altering the code to make that happen. When you’d made your changes to the code, you’d run the program and click that button to see what had happened.

Considered closely, this process is actually a lot like the scientific method. For step (1) you read the code. For step (2) you hypothesize what you’ll need to do to the code. For step (3) you predict the outcome of your changes, and for step (4) you make the changes and observe the results. The amount of time that it takes to perform an iteration of your coding version of the scientific method is what I’m calling the “feedback loop.” How long does it take for you to have an idea, implement it, and verify that it had the desired effect?

Scientist

In the early days of programming when the use of punch cards was common, feedback times were very lengthy. Programmers would reason carefully about everything that they did because feedback times were extremely slow, meaning mistakes were very costly. While many improvements have been made across the board to feedback times, situations persist to this day when the feedback loop is excruciatingly slow. This includes long running or resource-intensive applications and distributed systems with high latency. With such systems, programmers on projects often devise schemes to try to shorten the feedback loop, such as mocking out bottlenecks to allow fast verification of the rest of the system.

What they’re really trying to do is shorten the feedback loop to allow themselves to be more productive. When a great deal of time elapses between trying something and seeing what happens, attention tends to wander to distractions like twitter or reddit, exacerbating the inefficiency in this already-slow process. Developers innately understand this problem and are frustrated by the long build and run times of behemoth and slow-running applications.

To combat this problem, developers intuitively favor faster schemes. Ask yourself whether you prefer to work on a small project that builds quickly or a large one. How about a slow test suite versus a fast one? By speeding up the feedback loop you trade frustration and wandering attention span for engagement and a feeling of accomplishment. Techniques like relying on fast-running unit tests and keeping modules small and decoupled help a great deal with this, but we can get even faster.

If short feedback is good, immediate is definitely better. Anyone who has done extensive work at a command line or, in general used a Read-Evaluate-Print-Loop (REPL) understands this. Attention does not wander at all during a session like this. Historically, such a thing wasn’t possible in a compiled language, but with the advent of multicore systems and increasingly sophisticated compiler technology, times are changing. It is now possible to have a build running in the background of an IDE like Visual Studio even as you modify the code.

NCrunch

If you’ve been watching my series on building a chess game using TDD you couldn’t help but notice the red and green dots on the left side of the code window, since they catch the eye. What you were seeing was the tool, NCrunch, in action. Now it’s time to get properly acquainted.

NCrunch is a software product by Remco Software and was written by software developer Remco Mulder, who owns the company. It is a tool written specifically to allow developers to practice continuous testing in Visual Studio. NCrunch is a commercial product with a tiered pricing model and full-blown customer support. And it operates as a plugin to Visual Studio so there is no need to integrate or operate any kind of standalone application. It drops right in, comfortably with a tool with which you are already familiar.

For the first several years of its existence, NCrunch was free, since it was in an extended state of Beta release. During the course of these years, it grew a substantial and loyal user base. In the fall of 2012, Remco decided to issue version 1.0 and release NCrunch as a full, commercial product with a licensing model and production support. It is now on version 2.5 and is most certainly an excellent, commercial-grade product that is worth every penny.

As I write my code using this tool, you may notice things that I rarely or never do. I rarely, if ever, run an application. I rarely, if ever, use the unit test runner. I rarely even compile my code, though I do this sometimes simply because I happen to be quite accustomed to looking at compiler feedback in the errors window. Continuous testing tools like NCrunch may have been a novelty when they came out, but I would argue that they’re rapidly becoming table stakes for efficient development these days.

Before NCrunch, the viability of TDD for me was tied up in the idea that investing extra time up front meant that I wouldn’t later be revisiting my code, debugging, tweaking, fixing, when I was further removed and it’s more time consuming. With NCrunch, I don’t even need to make that case. Now, if you took TDD and NCrunch away, my development process would be substantially slower as I sat there waiting for the application to compile or the test runner to do its thing.

If you don’t have this, get it. You won’t be sorry. Forget clean code, unit testing, TDD, all of that stuff (well, not really — but indulge me here for a second). Just get this setup for the tight feedback loop alone. There is nothing like the feeling of productivity you get from typing a line of code and knowing in less than a second, without doing anything else, whether the change is what you want. That incredible power makes it all worth it — the learning curve of the tool, the cost of the tool, adopting TDD, learning to unit test. It’s like getting a car with 500 horse power and feeling that acceleration; it ruins you for anything less.

By

Define An API By Consuming It

Lately I’ve been working on a project that uses a layered architecture with DDD principles. There’s a repository layer and, below that, lies the purity of the domain modeling. Above that is an API service layer and then various flavors of presentation, as needed. One of the toughest thing to explain to people who are new to some of these concepts is “what should the service layer do?” Go ahead and try it yourself, or even look it up somewhere. Things like data access, domain, repository and presentation layers all have easily explained and understood purposes. The API service layer kind of winds up being the junk drawer of your architecture; “well, it doesn’t really belong in the domain logic and it’s not a presentation concern, so I guess we’ll stick it in the service layer.”

This seems to happen because it’s pretty hard for a lot of people to understand the concept of an API of which their team is both creator and consumer. I addressed this some time back in a post I wrote about building good abstractions and this post is sort of an actual field study instead of a contrived example. How do we know what makes sense to have in the application’s service layer? Well, write all of your presentation layer code assuming that you have magic boxes behind interfaces that cater to your every whim. If you do this, unless you have a pure forms-over-data CRUD app, you’ll find that your presentation layer wants different things than your repositories provide, and this is going to define your service level API.

Take a look at this relatively simple example that I dreamed up, based loosely on a stripped down version of something I’ve been doing:

One thing you’ll notice straightaway is that my controller is pretty stripped down. It really just worries about the arguments to the HTTP methods, a terse summary of what to do, and then what to return or where to redirect. The only things that will make this more complex as time goes on are GUI related things — enriching the model, adding more actions, altering the user’s workflow, etc. This makes unit testing/TDD a breeze and it keeps unpleasantness such as “how to send an email” or “where do we get employees” elsewhere. But, most importantly, it also defines an API.

Right here I’m thinking in the purest terms. I want to show my users a list of employees and I want to let them filter by department. I also want to be able to add a new employee. So, let’s see, I’m going to need some means of getting employees and some means of creating a new one. Oh, and the requirement says I need to send a welcome email in most circumstances (times I wouldn’t based on an elided GUI setting), so I’ll need something that does that too.

Now, you’re probably noticing a serious potential flaw with this approach, which is that having a service that sends emails and fetches customers seems as though it will violate the Single Responsibility Principle. You’re completely right. It will (would). But we’re not done here by any means. We’re not defining what a service will do, but rather what our controller won’t do (as well as what it will do). Once we’re done with the controller, we can move on to figuring out appropriate ways to divvy up the behavior of the service or services that this will become.

Here’s another thing I like about this approach. I’m in the habit of defining a “Types” assembly in which I put the interfaces that the layers use to talk to one another. So, the Presentation layer doesn’t actually know about any concrete implementations in the Service layer because it doesn’t even have a reference to that assembly. I use an IoC container, and I get to do this because of it:

Right there in the controller’s source file, below it, I stick a dummy implementation of the service and I wire up the IoC to use it. This is really handy because it lets me simulate, in memory, the way the application will behave assuming we later define real, actual service implementations. So I can use TDD to define the behavior of the controllers, but then I can also use this dummy service to define the layout, appearance, and flow of the views using actual, plausible data. And all of this without worrying about data access or anything else for the time being. I’m bifurcating the application and separating (to a large degree) the fulfillment of user stories from the nitty gritty technical details. And, perhaps most importantly, I’m defining what the service API should be.

I like having that dummy class there. It creates a warning for me in Code Rush (multiple classes in the same source file) that nags at me not to forget to delete it when I’m done. I can modify the GUI behavior right from the controller. As I add methods to the service while going along with my TDD for the controller, it’s easy enough to add implementers here as needed to compile. I consider this a win on my many fronts.

And when I’m done with the presentation layer and satisfied, I can now look at the single service I’ve defined and say to myself things like “should this be more than one class?” or “is some of this functionality already implemented?” I’ve nailed the presentation layer and I’m ready to do the same with the service layer, knowing exactly what kind of API my ‘client’ wants.

By

Help Yourself To My Handy Extension Methods

There’s nothing like 3 years (or almost 3 years) of elapsed time to change your view on something, such as my take on C# extension methods. At the time I was writing this post, I was surrounded by wanton abuse of the construct, compliments of an Expert Beginner, so my tone there was essentially reminiscent of someone who had just lost a fingertip in a home improvement accident writing a post called “why I don’t like skill saws.” But I’ve since realized that cutting lots of lumber by hand isn’t all it’s cracked up to be. Judicious use of extension methods has definitely aided in the readability of my code.

I’m going to post some examples of ones that I’ve found myself using a lot. I do this almost exclusively in test assemblies, I think because, while code readability is important everywhere, in your unit tests you should literally be explaining to people how your API works using code as prose (my take anyway). So, here are some things I do to further that goal.

ToEnumerable

One of the things that annoyed me for a long time in C# was the thought that initializing lists was clunkier than I want. This isn’t a criticism of the language, per se, as I don’t really know how to make list initialization in the language much more concise. But I wanted it to be on a single line and without all of the scoping noise of curly brackets. I’m picky. I don’t like this line of code:

I’m arranging this retrieval service (using Telerik’s JustMock) so that its GetAll() method returns a list of Foos with just one Foo. The two different instances of Foo are redundant and I don’t like those curly braces. Like I said, picky. Another issue is that a lot of these methods that I’m testing deal in enumerables rather than more defined collection types. And so I wrote this:

And writing this method and its overload change the code I don’t like to this:

One occurrence of Foo, zero curly brackets. Way more readable, for my money and, while YMMV, I don’t know if it’s going to vary that much. Is it worth it? Absolutely, in my book. I’ve eliminated duplication and made the test more readable.

IntOrDefault

Do you know what I hate in C#? Safe parsing of various value types from strings. It’s often the case that I want something like “pull an int out of here, but if anything goes wrong, just set it to zero.” And then you know the drill. The declaration of an int. The weird out parameter pattern to Int.TryParse(). The returning of said int. It’s an ugly three lines of code. So I made this:

Now, if I want to take a whack at a parse, but don’t really care if it fails, I have client code that looks like this:

What if that indexed value is empty? Zero. What if it has a bunch of non-numeric characters? Zero. What if it is null? Zero. No worries. If there’s actually an int in that string (or whatever it is), then you get the int. Otherwise, 0 (or, technically, default(int)).

I actually created this for a bunch of other primitive types as well, but just showed one here for example’s sake.

GetModel<T>

This is an MVC-specific one for when I’m trying to unit test. I really don’t like that every time I want to unit test a controller method that returns ViewResult I have to get the model as an object and then cast it to whatever I actually want. This syntax is horribly ugly to me:

Now, that may not look like much, but when you start chaining it together and have to add a second set of parentheses like ((Customer)view.Model).SomeCustomerProperty, things get ugly fast. So I did this instead — falling on the ugliness grenade.

It still fails fast with an invalid cast exception, but you don’t need to look at it, and it explains a lot more clearly what you’re doing:

Mocking Function Expression Arguments

This is a little more black belt, but if you’re an architect or senior developer and looking to make unit testing easier on less experienced team members, this may well help you. I have a setup with Entity Framework hidden behind a repository layer, and mocking the repositories gets a little… brutal… for people who haven’t been dealing with lambdas for a long time:

“Don’t worry, that just means you can pass it any Expression of Func of Customer to bool!” Now, you’re thinking that, and I’m thinking that, but a lot of people would be thinking, “Take your unit testing and shove it — I don’t know what that is and I’m never going to know.” Wouldn’t it be easier to do this:

Intent is still clear — just without all of the noise. Well, you can with this extension method:

Now, I’ll grant you that this is pretty specific. It’s specific to JustMock and to my implementation of repository methods, but the idea remains. If you’re dealing with Expressions like this, don’t make people trying to write tests type out those nasty argument matcher expressions over and over again. They’re extremely syntactically noisy and add nothing for clarity of purpose.

Edit: Thanks, Daniel and Timothy for the feedback about Chrome. I’ve had intermittent issues with the syntax highlighter plugin over the last few months in Chrome, and the problem stops whenever I change a setting in the syntax highlighter, making me think it’s fixed, but then it comes back. Good news is, seeing it impact others lit a fire under me to go find a replacement, which I did: Crayon Syntax Highlighter. I already like it better, so this is a “make lemonade out of lemons situation.” Thanks again for the feedback and please let me know if you have issues with this new plugin.

By

Faking An Interface Mapping in Entity Framework

Entity Framework Backstory

In the last year or so, I’ve been using Entity Framework in the new web-based .NET solutions for which I am the architect. I’ve had some experience with different ORMs on different technology stacks, including some roll your own ones, but I have to say that Entity Framework and its integration with Linq in C# is pretty incredible. Configuration is minimal, and the choice of creating code, model or database first is certainly appealing.

One gripe that I’ve had, however, is that Entity Framework wants to bleed its way up past your infrastructure layer and permeate your application. The path of least resistance, by far, is to let Entity Framework generate entities and then use them throughout the application. This is often described, collectively, as “the model,” and it impedes your ability to layer the application. The most common form of this that I’ve seen is simply to have Entity Framework context and allow that to be accessed directly in controllers, code behind or view models. In a way, this is actually strangely reminiscent of an Active Record except that the in memory joins and navigation operations are a lot more sophisticated. But the overarching point is still the same — not a lot of separation of functionalities and avoidance of domain modeling in favor of letting the database ooze its way into your code.

CakeAndEatItToo

I’m not a fan of this approach, and I very much prefer applications that are layered or otherwise separated in terms of concerns. I like there to be a presentation layer for taking care of presenting information to the user, a service layer to serve as the application’s API (flexibility and for acceptance testing), a domain layer to handle business rules, and a data access layer to manage persistence (Entity Framework actually takes care of this layer as-is). I also like the concept of “persistence ignorance” where the rest of the application doesn’t have to concern itself where persistent data is stored — it could be SQL Server, Oracle, a file, a web service… whatever. This renders the persistence model and ancillary implementation detail which, in my opinion, is what it should be.

A way to accomplish this is to use the “Repository Pattern,” in which higher layers of the application are aware of a “repository” which is an abstraction that makes entities available. Where they come from to be available isn’t any concern of those layers — they’re just there when they’re needed. But in a lot of ways with Entity Framework, this is sort of pointless. After all, if you hide the EF-generated entities inside of a layer, you don’t get the nice query semantics. If you want the automation of Entity Framework and the goodness of converting Linq expressions to SQL queries, you’re stuck passing the EF entities around everywhere without abstraction. You’re stuck leaking EF throughout your code base… or are you?

Motivations

Here’s what I want. I want a service layer (and, of course, presentation layer components) that is in no way whatsoever aware of Entity Framework. In the project in question, we’re going to have infrastructure for serializing to files and calling out to web services, and we’re likely to do some database migration and using of NoSQL technologies. It is a given that we need multiple persistence models. I also want at least two different version of the DTOs: domain objects in the domain layer, hidden under the repositories, and model objects for binding in the presentation layer. In MVC, I’ll decorate the models with validation attributes and do other uniquely presentation layer things. In the WPF world, these things would implement INotifyPropertyChanged.

Now, it’s wildly inappropriate to do this to the things generated by EntityFramework and to have the domain layer (or any layer but the presentation layer) know about these presentation layer concepts: MVC Validation, WPF GUI events, etc. So this means that some kind of mapping from EF to models and vice-versa is a given. I also want rich domain objects in the domain layer for modeling business logic. So that means that I have two different representations of any entity in two different places, which is a classic case for polymorphism. The question then becomes “interface implementation or inheritance?” And I choose interface.

My reasoning here is certainly subject to critique, but I’m a fan of creating an assembly that contains nothing but interfaces and having all of my layer assemblies take a dependency on that. So, the service layer, for instance, knows nothing about the presentation layer, but the presentation layer also knows nothing about the service layer. Neither one has an assembly reference to the other. A glaring exception to this is DTOs (and, well, custom exceptions). I can live with the exceptions, but if I can eliminate the passing around of vacuous property bags, then why not? Favoring interfaces also helps with some of the weirdness of having classes in various layers inherit from things generated by Entity Framework, which seems conspicuously fragile. If I want to decorate properties with attributes for validation and binding, I have to use EF to make sure that these things are virtual and then make sure to override them in the derived classes. Interfaces it is.

Get Ur Dun!

So that’s great. I’ve decided that I’m going to use ICustomer instead of Customer throughout my application, auto-generating domain objects that implement an interface. That interface will be generated automatically and used by the rest of the application, including with full-blown query expression support that gets translated into SQL. The only issue with this plan is that every google search that I did seemed to suggest this was impossible or at least implausible. EF doesn’t support that, Erik, so give it up. Well, I’m nothing if not inappropriately stubborn when it comes to bending projects to my will. So here’s what I did to make this happen.

I have three projects: Poc.Console, Poc.Domain, and Poc.Types. In Domain, I pointed an EDMX at my database and let ‘er rip, generating the T4 for the entities and also the context. I then copied the Entity T4 template to the types assembly, where I modified it. In types, I added an “I” to the name of the class, changed it to be an interface instead of a class, removed all constructor logic, removed all complex properties and navigation properties, and removed all visibilities. In the domain, I modified the entities to get rid of complex/navigation properties and added an implementation of the interface of the same name. So at this point, all Foo entities now implement an identical IFoo interface. I made sure to leave Foo as a partial because these things will become my domain objects.

With this building, I wrote a quick repository POC. To do this, I installed the nuget package for System.Dynamic.Linq, which is a really cool utility that lets you turn arbitrary strings into Linq query expressions. Here’s the repository implementation:

public class Repository<TEntity, TInterface> where TEntity : class, new() where TInterface : class
{
    private PlaypenDatabaseEntities _context;

    /// <summary>
    /// Initializes a new instance of the Repository class.
    /// </summary>
    /// <param name="dbSet"></param>
    public Repository(PlaypenDatabaseEntities context)
    {
        _context = context;
    }

    public IEnumerable<TInterface> Get(Expression<Func<TInterface, bool>> predicate = null)
    {
        IQueryable<TEntity> entities = _context.Set<TEntity>();

        if (predicate != null)
        {
            var predicateAsString = predicate.Body.ToString();
            var parameterName = predicate.Parameters.First().ToString();
            var parameter = Expression.Parameter(typeof(TInterface), predicate.Parameters.First().ToString());
            string stringForParseLambda = predicateAsString.Replace(parameterName + ".", string.Empty).Replace("AndAlso", "&&").Replace("OrElse", "||");
            var newExpression = System.Linq.Dynamic.DynamicExpression.ParseLambda<TEntity, bool>(stringForParseLambda, new[] { parameter });
            entities = entities.Where(newExpression);
        }

        foreach (var entity in entities)
            yield return entity as TInterface;
    }
}

Here’s the gist of what’s going on. I take an expression of IFoo and turn it into a string. I then figure out the parameter’s name so that I can strip it out of the string, since this is the form that will make ParseLambda happy. Along these same lines, I also need to replace “AndAlso” and “OrElse” with “&&” and “||” respectively. The former format is the how expressions are compiled, but ParseLambda looks for the more traditional expression combiners. Once it’s in a pleasing form, I parse it as a lambda, but with type Foo instead of IFoo. That becomes the expression that EF will use. I then query EF and cast the results back to IFoos.

Now, I’ve previously blogged that casting is a failure of polymorphism. And this is like casting on steroids and some hallucinogenic drugs for good measure. I’m not saying, “I have something the compiler thinks is an IFoo but I know is a Foo,” but rather, “I have what the compiler thinks of as a non-compiled code scheme for finding IFoos, but I’m going to mash that into a non-compiled scheme for finding Foos in a database, force it to happen and hope for the best.” I’d be pretty alarmed if not for the fact that I was generating interface and implementation at the same time, and that if I define some other implementation to pass in, it must have any and all properties that Entity Framework is going to want.

This is a proof of concept, and I haven’t lived with it yet. But I’m certainly going to try it out and possibly follow up with how it goes. If you’re like me and were searching for the Holy Grail of how to have DbSet<IFoo> or how to use interfaces instead of POCOs with EF, hopefully this helps you. If you want to see the T4 templates, drop a comment and I’ll put up a gist on github.

One last thing to note is that I’ve only tried this with a handful of lambda expressions for querying, so it’s entirely possible that I’ll need to do more tweaking for some edge case scenarios. I’ve tried this with a handful of permutations of conjunction, disjunction, negation and numerical expressions, but what I’ve done is hardly exhaustive.

Happy abstracting!

Edit: Below are the gists:

  • Entities.Context.tt for the context. Modified to have DBSets of IWhatever instead of Whatever.
  • Entities.tt for the actual, rich domain objects that reside beneath the repositories. These guys have to implement IWhatever.
  • DTO.tt contains the actual interfaces. I modified this T4 template not to generate navigation properties at all because I don’t want that kind of rich relationship definition part of an interface between layers.

Acknowledgements | Contact | About | Social Media