In Defense of Test Setup Methods

This post started out as a response to a comment on this post, but it started to get rather long for that venue, so I’m making it its own post. The comment concerns my habit of pulling instantiation logic for the class under test into a common setup method executed by the test runner prior to each test. Here are the main points that I’m writing this to address:

  • Why do I dislike code duplication in tests?
  • Extracting logic to a common setup method creates performance problems.
  • The indirection in pulling part of the setup out of a test method hurts readability.

Back Story and Explanations

I wrote this post a little under two years ago. It’s germane here because I explain in it my progression from not using the setup method to doing so. For those not explicitly familiar with the concept and perhaps following along with my Chess TDD series, what I’m referring to is a common pattern (or anti-pattern, I suppose, depending on your opinion) in unit testing across many languages and frameworks. Most test runners allow you to specify a unique method that will be called before each test is run and also after each test is run. In the case of my series, using C# and MSTest, the setup happens on a per class basis. Here’s some code to clarify. First, the class under test:

And, now, the plain, “before,” test class:

And finally, what MS Test allows me to do:

Okay, so what’s going on here? Well, the way that MS Test works is that for each test method, it creates a new instance of the test class and executes the method. So, in the case of our original gravity test class, three instances are created. There is no instance state whatsoever, so this really doesn’t matter, but nonetheless, it’s what happens. In the second version, this actually does matter. Three instances get created, and for each of those instances, the _gravity instance is initialized. The “BeforeEachTest” method, by virtue of its “TestInitialize” attribute, is invoked prior to the test method that will be executed as part of that test.

Conceptually, both pieces of code have the same effect. Three instances are created, three Gravity are instantiated, three invocations of GetVelocityAfter occur, and three asserts are executed. The difference here is that there are three gravity variables of method level scope in the first case, and one instance variable of class scope in the second case.

Okay, so why do I choose the second over the first? Well, as I explained in my old post, I didn’t, initially. For a long, long time, I preferred the first option with no test setup, but I eventually came to prefer the second. I mention this strictly to say that I’ve seen merits of both approaches and that this is actually something to which I’ve given a lot of thought. It’s not to engage the subtle but infuriating logical fallacy in which an opponent in an argument says something like “I used to think like you, but I came to realize I was wrong,” thus in one fell swoop establishing himself as more of an authority and invalidating your position without ever making so much as a single cogent point. And, in the end, it’s really a matter of personal preference.

Why I Don’t Like Duplication

So, before I move on to a bit more rigor with an explanation of how I philosophically approach structuring my unit tests, let me address the first point from the comment about why I don’t like duplication. I think perhaps the most powerful thing I can do is provide an example using this code I already have here. And this example isn’t just a “here’s something annoying that could happen,” but rather the actual reason I eventually got away from the “complete setup and teardown in each test” approach. Let’s say that I like my Gravity class, but what I don’t like is the fact that I’ve hard-coded Earth’s gravity into the GetVelocityAfter method (and, for physics buffs out there, I realize that I should, technically, use the mass of the falling object, gravitational constant, and the distance from center of mass, but that’s the benefit of being a professional programmer and not a physicist — I can fudge it). So, let’s change it to this, instead:

Now, I have some test refactoring to do. With the first approach, I have six things to do. I have to declare a new Planet instance with Earth’s specifications, and then I have to pass that to the Gravity constructor. And then, I have to do that exact same thing twice more. With the second approach, I declare the new Planet instance and add it to the Gravity constructor once and I’m done. For three tests, no big deal. But how about thirty? Ugh.

I suppose you could do some find and replace magic to make it less of a chore, but that can get surprisingly onerous as well. After all, perhaps you’ve named the instance variable different things in different places. Perhaps you’ve placed the instantiation logic in a different order or interleaving in some of your tests. These things and more can happen, and you have have an error prone chore on your hands in your tests the same way that you do in production code when you have duplicate/copy-paste logic. And, I don’t think you’d find too many people that would stump for duplication in production code as a good thing. I guess by my way of thinking, test code shouldn’t be a second class citizen.

But, again, this is a matter of taste and preference. Here’s an interesting stack overflow question that addresses the tradeoff I’m discussing here. In the accepted answer, spiv says:

Duplicated code is a smell in unit test code just as much as in other code. If you have duplicated code in tests, it makes it harder to refactor the implementation code because you have a disproportionate number of tests to update. Tests should help you refactor with confidence, rather than be a large burden that impedes your work on the code being tested.

If the duplication is in fixture set up, consider making more use of the setUp method or providing more (or more flexible) Creation Methods.

He goes on to stress that readability is important, but he takes a stance more like mine, which is a hard-line one against duplication. The highest voted answer, however, says this:

Readability is more important for tests. If a test fails, you want the problem to be obvious. The developer shouldn’t have to wade through a lot of heavily factored test code to determine exactly what failed. You don’t want your test code to become so complex that you need to write unit-test-tests.

However, eliminating duplication is usually a good thing, as long as it doesn’t obscure anything. Just make sure you don’t go past the point of diminishing returns.

In reading this, it seems that the two answers agree on the idea that readability is important and duplication is sub-optimal, but where they differ is that one seems to lean toward, “avoiding duplication is most important even if readability must suffer” and the other leans toward, “readability is most important, so if the only way to get there is duplication, then so be it.” (I’m not trying to put words in either poster’s mouth — just offering my take on their attitudes)

But I think, “why choose?” I’m of the opinion that if redundancy is aiding in readability then some kind of local maximum has been hit and its time to revisit some broader assumptions or at least some conventions. I mean why would redundant information ever be clearer? At best, I think that redundancy is an instructional device used for emphasis. I mean, think of reading a pamphlet on driving safety. It probably tells you to wear your seatbelt 20 or 30 times. This is to beat you over the head with it — not make it a better read.

So, in the end, my approach is one designed to avoid duplication without sacrificing readability. But, of course, readability is somewhat subjective, so it’s really your decision whether or not I succeed as much as it is mine. But, here’s what I do.

My Test Structure

Let’s amend the Gravity class slightly to have it use an IPlanet interface instead of a Planet and also to barf if passed a null planet (more on that shortly):

Let’s then take a look at how I would structure my test class:

Alright, there’s a lot for me to comment on here, so I’ll highlight some things to pay attention to:

  • Instance of class under test is named “Target” to be clear what is being tested at all times.
  • Planet instance is just named “Planet” (rather than “MockPlanet”) because this seems to read better to me.
  • Nested class has name of method being tested (eliminates duplication and brings focus only to what it does).
  • Test initialize method does only the minimum required to create an instance that meets instance preconditions.
  • There’s not much going on in the setup method — just instantiating the class fields that any method can modify.
  • The second two test methods still have some duplication (duplicate setup).

Now that you’ve processed these, let me extrapolate a bit to my general approach to test readability:

  • The class nesting convention helps me keep test names as short as possible while retaining descriptiveness.
  • Only precondition-satisfying logic goes in the initialize method (e.g. instantiating the class under test (CUT) and passing a constructor parameter that won’t crash it)
  • Setting up state in the class under test and arranging the mock objects is left to the test methods in the context of “Given” for those asserts (e.g. in the second test method, the “Given” is “On_Earth” so it’s up to that test method to arrange the mock planet to behave like Earth).
  • I use dependency injection extensively and avoid global state like the plague, so class under test and its depdenencies are all you’ll need and all you’ll see.
  • Once you’re used to the Target/Mocks convention, it’s (in my opinion) as readable, if not more so, than a method variable. As a plus, you can always identify the CUT at a glance in my test code. To a lesser degree, this is true of mocks and constants (in C#, these have Pascal Casing)
  • I suppose (with a sigh) that I’m not quite 100% on the maturity model of eliminating duplication since I don’t currently see a duplication-eliminating alternative to setting up Earth Gravity in two of the test methods. I think it’s not appropriate to pull that into the test initialize since it isn’t universal and adding a method call would just make the duplication more terse and add needless indirection. To be clear, I think I’m failing rather than my methodology is failing — there’s probably a good approach that I simply have not yet figured out. Another important point here is that sometimes the duplication smell is less about how you structure your tests and more about which tests you write. What I mean is… is that “Returns_NineyEight_After_10_Seconds_On_Earth” test even necessary? The duplicate setup and similar outcome is, perhaps, telling me that it’s not… But, I digress.

Addressing the Original Concerns

So, after a meandering tour through my approach to unit testing, I’ll address the second and third original points, having already addressed the question of why I don’t like duplication. Regarding performance, the additional lines of code that are executed with my approach are generally quite minimal and sometimes none (such as a case where there are no constructor-injected dependencies or when every unit test needs a mock). I suspect that if you examined code bases in which I’ve written extensive tests and did a time trial of the test suite my way versus with the instantiation logic inlined into each test, the difference would not be significant. Anecdotally, I have not worked in a code base in a long, long time where test suite execution time was a problem and, even thinking back to code bases where test suite performance was a problem, this was caused by other people writing sketchy unit tests involving file I/O, web service calls, and other no-nos. It’s not to say that you couldn’t shave some time off, but I think the pickings would be pretty lean, at least with my approach of minimal setup overhead.

Regarding readability, as I’ve outlined above, I certainly take steps to address readability. Quite frankly, readability is probably the thing most frequently on my mind when I’m programming. When it comes to test readability, there’s going to be inevitable disagreement according to personal tastes. Is my “Target” and initialization convention more readable than an inline method variable also with the convention of being named “Target”? Wow — eye of the beholder. I think so because I think of the instantiation line as distracting noise. You may not, valuing the clarity of seeing the instantiation right there in the method. Truth is with something like that, what’s readable to you is probably mainly a question of what you’re used to.

But one thing that I think quite strongly about test readability is that it is most heavily tied in with compactness of the test methods. When I pick up a book about TDD or see some kind of “how to” instructional about unit tests, the tests always seem to be about three lines long or so. Arrange, act, assert. Setup, poke, verify (as I said in that old post of mine). Whatever — point is, the pattern is clear. Establish a precondition, run an experiment, measure the outcome. As the setup grows, the test’s readability diminishes very quickly. I try to create designs that are minimal, decoupled, compatible with the Single Responsibility Principle, and intuitive. When I do this, test methods tend to remain compact. Eliminating the instantiation line from every test method, to me, is another way of ruthlessly throttling the non-essential logic inside those test methods, so that what’s actually being tested holds center stage.

So, in the end, I don’t know that I’ve made a persuasive case — that’s also “eye of the beholder.” But I do feel as though I’ve done a relatively thorough job of explaining my rationale. And, my mind is always open to being changed. There was a time when I had no test setup method at all, so it’s not like it would be unfamiliar ground to return there.

  • Matt Keen

    I have recently been experimenting a lot with test readability. I’ve always been a fan of test setups like you are defending here, but i’ve taken it a step further recently.
    One of the things that bothered me was when refactoring other people’s code is i would end up spending a long time understanding the context behind setups and verifications in the tests i caused to fail.

    For example a test for a class involved in communication may be verifying that a connection is closed under certain conditions, which would involve clearing up resources and notifying other layers in the protocol stack. If you break this functionality whilst refactoring or fixing an unrelated bug then you are presented with a bunch of failing verifications and have to search through the class logic to figure out what they mean. By splitting out these verifications into a “VerifyConnectionClosed” method then you immediately give some context. The refactorer can then see they have accidently deleted the line that calls the close method within the code without having to waste time understanding the class in more depth than they need.

    This has other added advantages; future tests are easier to write (especially for people who are just fixing little bugs), and any refactoring to the signatures of those verifications only have to be changed in one place.

    I will say that complicated setups and verifications are normally a sign of bloated classes, but in the 0.5% of cases where they aren’t its very useful. This approach has also really helped when writing integration tests where the context is wider and you don’t necessarily want every coder who comes along to worry themselves about understanding the system under test to finite detail.

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1675()

  • samy

    I am really interested by your points: in fact I’m so interested that I once played with peg.js (a grammar parser in javascript) to write a test skeleton setup for .Net following your and Phil Haack recommended organization for tests (http://vrittis.github.io/Skeletest/). Realistically in VS I have a collection of snippets (tSetup, tShould, tThrows…) that do the same work but this was more of an exercice when I started TDD-ing heavily and wanted to add tests to already existing classes.

    I don’t know what you think about it but I would not hesitate to push setting up the Earth gravity in a private method of the test with a very expressive name, or alternatively I would use the ITestAction interface (in NUnit) that you can apply on a custom test attribute to add behavior to a test _but_ this decreases readability even more. I don’t use it any more at the moment

    Anyway, I really appreciate reading your thoughts; I haven’t watched your chess videos yet but intend to do so asap. Maybe they will answer the current question that bothers me with TDD: how can a test suite for a complicated feature can be rendered readable to a human being?

    Cheers,
    Samy

  • ardalis

    I very much agree with your points here and try to stress that test code needs to be kept clean every bit as much as production code if they’re to remain in a usable state. Regarding naming and structuring the tests, I might try the nested class approach sometime, but I think you might benefit from naming your original test class better. I like this naming structure: http://ardalis.com/unit-test-naming-convention because it lets each test be read like a sentence in the test runner, making it very clear what is being tested.

  • dave falkner

    I agree with your points as well. It has always seemed to me like arguing for redundant test setup within each test method seems to be eschewing Structured Programming for some reason, I suppose simply because you’re in the context of a unit test, a reason which doesn’t make any sense to me.

    Had one coworker argue for setups in test methods because “you can see it right there,” but imagine if the entirety of the production code were designed under a similar philosophy!

    Also, I’ve read plenty of people argue for a invoking common CreateTarget() factory method from within each test and this seems reasonable, and definitely far better than actual duplicated setup code.

    But it seems to me as though, in a programming industry which currently leans heavily on convention-based frameworks, the test initialize/setup method is just one convention and is an exceedingly simple one at that.

  • http://www.daedtech.com/blog Erik Dietrich

    Do you have a gist or a link to an example I could see on github? I’m not sure I’m picturing this correctly. Sounds to me as though you’re talking about having a series of asserts in a method that you extract that verifies preconditions of setup…?

  • http://www.daedtech.com/blog Erik Dietrich

    I think the “very expressive” descriptor captures what my criteria for such a thing would be as well. I’ve sort of moved away from having auxiliary methods in test classes, but that wasn’t a hard/fast rule I made for myself. It’s just happened organically. So, it’s not as though I’d be opposed to the practice if it were readable. It certainly wouldn’t faze me if I were reviewing someone’s code and saw this, as long as the intention was clear.

  • http://www.daedtech.com/blog Erik Dietrich

    It looks like a *very* appealing way to structure tests off the cuff. I think I’ve probably never considered something like this before out of the herd mentality of “Foo’s test class is named FooTest and that’s that.” Interestingly, before my .NET heyday when I did a lot of Java, I don’t think I used to sweat the 1 to 1 ratio of FooTest to Foo. But I guess that’s sort of been ingrained in me over the last several years.

    However, when I actually think about it, the only real reasoning for this that I can cough up is “discoverability with the convention allowing other devs to know where to find tests.” But with the tooling I’m using, like NCrunch, even this actually seems like a flimsy piece of reasoning.

    I see that post is dated from over three years ago. Has that naming scheme stood the test of time for you? Do other developers tend to find this intuitive when onboarding on the project? One last question, too: do you then do something like make namespaces/folders out of the class under test, so that all of the PricingCalculator behavior classes are in a PricingCalculator folder/namespace?

    I think I’m definitely going to play with this the next time I’m doing an example code base or a code kata. I’m very intrigued — thanks for the suggestion!

  • http://www.daedtech.com/blog Erik Dietrich

    I think I can put myself in the position of the commenter fairly easily if he’s had a code base inflicted on him with a ton of setup, and especially *conditional* setup in test initialize methods (if we’re going to be testing X, then do this, otherwise, do this). I’ve actually seen people do some pretty ugly stuff in there, which I think colored my original hesitation to adopt them.

    But what you’re saying resonates with me on every point. I hesitate to conclude “well, this is test code, so the rules are different.” I’ve also heard people argue “I like to see it all in front of me,” but in production code (talked about that here: http://www.daedtech.com/i-love-debugger ). And I also fully agree on making use of a pretty common convention to standardize the testing approach.

  • ardalis

    I’ve been using it and recommending it for years (even before the blog post). I haven’t seen a downside (the FooTests convention is pretty worthless, IMO). Yes, I will use folders/namespaces to organize as needed. Usually I will start by making my test project folders mirror my SUT’s folders. So for a DDD Core project’s tests I might have folders in both the Core project and the Test project for Model and Services, for instance. In an MVC test project, I might have a folder for Controllers, etc. If a particular class has so many tests-per-method that it is creating a large number of files (1 file per method), then I’ll use folders for that as well, but usually I find this to be a smell (class probably violates SRP or at least exhibits the Large Class smell), so I refactor into smaller classes.

  • Matt Keen

    This looks cool, but do you find yourself duplicating your setups a lot? And does that cause issues when refactoring?

  • Matt Keen

    In my very rambling way i was just saying i prefer to split down my tests further into other methods to try and give more description to the test for people who are perhaps unfamiliar with the class and its context.

    I took the approach with your example above here:
    https://gist.github.com/MattKeen/361124ab4a2c101edcd1

    Apologies if it doesn’t compile! It is only a simple example and the benefit would be greater if there were more than 1 verify for the behaviour, i.e. the overall behaviour of getting a velocity wasn’t just described by 1 method call, but hopefully you get the gist!

  • ardalis

    Not that I’ve found. I promote following PDD: Pain Driven Development. If it’s causing you pain, do something about it. If duplication in setup between classes (which I assume is what you’re talking about) was causing me pain during refactoring of the SUT, I would simply refactor the tests (perhaps to share a base class or a common helper class or utility method) to remove the duplication.

  • Mark IJbema

    Thanks for your extended reply! I like this approach, and I enjoyed reading your thoughts about it. The amount of extraction seems a bit saner than some of the stuff I’ve seen ;)

    In a sense it’s not so much different from how I used to do it in Ruby with RSpec. Target is called subject then. The main difference is that in this solution everything is always setup, whereas in RSpec it normally is lazy (so equivalent to having a method create the object). When doing proper unit tests this indeed does not matter much, but I’ve seen a lot of tests using the database, and having an extra 1 or 2 queries before each test can make a lot of speed difference.

    Do you also integration test, and if so, do you follow the same structure there, or do you keep everything together there?

  • Matt Keen

    Of course! I should probably avoid writing comments first thing in the morning…

    Couldn’t agree more about using pain to drive development. I prefer the naming with your way of doing things, it makes everything much clearer.

    One of the things i get out of keeping a 1:1 ratio is it becomes quite apparent when a class has too many responsibilities, either through a horrid setup or just sheer number of tests. This can be very useful for getting a grasp of any potential design issues when reviewing code.

    I suppose your technique could also highlight quite nicely when a class has too many responsibilities, as the number of test sub-classes would get large? Perhaps this would give a quicker feel than “there are lots of tests here, is that because of design issues”?

  • http://www.daedtech.com/blog Erik Dietrich

    I’ve been dabbling a bit with Ruby on Rails of late, but not really enough that I can speak intelligently about test setups and idiomatic approaches in that language and framework, unfortunately. I’ll have to check back when my Ruby cred is higher.

    I do write integration tests, and I’ll split these loosely into two categories: tests that involve multiple components together in the application and tests that include externalities (GUI, database, files, etc). For the former, I’ll follow a similar pattern to the one outlined here, but for the latter, I’ll typically make use of an actual console application or some kind of heavier-weight tooling (ideally a setup that can be invoked from the command line for build purposes).

  • http://www.daedtech.com/blog Erik Dietrich

    Everything you’re describing here makes a lot of sense to me. I’m having the same aha! feeling that I’ve had in the past when I’ve first been exposed to something that I came to value a lot. Definitely going to have to give this approach a try, particularly since I can’t think of anything I gain out of the Foo/FooTest convention at all besides it being what I’m used to.

  • http://www.daedtech.com/blog Erik Dietrich

    Perfectly clear, and that makes sense. I’ve experimented with approaches like that too, trying to hit on the right combination of readability and discoverability for others. What you did there looks pretty readable to me (though I’ve heard people in the past grumble about asserts failing when they aren’t in the test method, that’s not really a complaint that I have, personally).

  • Rob H

    My concern with this approach is that you are lumping together Act & Assert.

    The person reading this test now needs to go look at AssertVelocityAfterNumberOfSeconds to know *how* it’s getting the velocity after N seconds.

    I commented on the gist with my suggestion. I’m a big fan of more readable assertion libraries.

    However, I think you have a great point about complex assertions. If you need to verify a bunch of things to ensure the connection was closed, then it’s great to have a method that *only* asserts those things. I would just be very careful not to conflate the action on the SUT with the assertion.

    Also, in complicated scenarios, a tool like https://github.com/approvals/ApprovalTests.Net can make it much easier than having a long set of asserts.

  • Matt Keen

    Thanks for the link. On a quick glance it looks promising!

    I also agree the more readable asserts are nice.

    I’ve given some thought to your comment and i must admit i’m unsure as to which approach i prefer. On the one hand i can see the benefit of having the action on the SUT visible in the actual test method. I would say though that once you have looked into the AssertVelocityAfterNumberOfSeconds method once, it could make reading multiple similar tests easier.

    I guess the main reason i lumped together the act and assert was to remove duplication and ease future refactoring, both in the SUT and the test class itself.

    Writing new tests would be easier with the duplication removed as well. If there were some complicated logic within the target method which required lots of edge case tests to be written, i would quickly become irritated writing the same 2 lines over and over again.

    In general i prefer to treat test classes the same as i would production code and put less preference on the “arrange, act, assert”. Perhaps that isn’t the right approach though. As i said, i’m a bit on the fence!

  • Pingback: Chess TDD 13: Moving On | DaedTech()