.NET

Getting Too Cute with C# Yield Return

ByErik Dietrich January 18, 2013November 6, 2017

I ran across a method that returned an IEnumerable<T> recently, and I implicitly typed its return value. During the course of a series of method extractions, code movement, and general refactoring, I wound up with some code that passed the various unit tests in place but failed curiously at runtime. After peering at it for a few minutes and going through once in the debugger, I traced it to a problem that you don’t see every day, and one that probably would have had me tearing my hair out if I didn’t have a good working understanding of what the “yield” keyword in C# does. So today, I’ll present the essence of this problem in the hopes that, if you weren’t aware of it, you are now.

Here is an entire class that contains a nested type and a couple of methods, for illustration purposes. At the bottom is a unit test that will, if you copy this into your scratchpad, fail.

public class MiscTest
{
    public class Point
    {
        public int X { get; set; }
        public int Y { get; set; }
    }

    private IEnumerable GetPoints()
    {
        for (int index = 1; index < 20; index++)
            yield return new Point() { X = index, Y = index * 2 };
    }

    private void DoubleXValue(IEnumerable points)
    {
        foreach (var point in points)
            point.X *= 2;
    }

    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void Asdf()
    {
        var points = GetPoints();
        DoubleXValue(points);
            
        Assert.AreEqual(2, points.ElementAt(0).X);
    }
}

It seems pretty straightforward. You have some method that returns a bunch of points, and then you take those points and pass them to a method that iterates through them, performing an operation on each one. So what gives? Why does this fail? Everything looks pretty simple (unlike my situation, where this became removed through a few layers of indirection), and yet we get back 1 when we’re expecting 2.

To understand this, it’s important to understand what yield actually does. At its core, the yield keyword is syntactic sugar that tells the compiler to generate a state machine under the hood. Let that sink in for a moment, because it’s actually kind of a wild concept. You’re used to methods that return references to object instances or primitives or collections, but this is something fundamentally different. A method that returns an IEnumerable and does so using yield return isn’t defining a return value–it’s defining a protocol for interacting with client code.

Consider the code example above. The obvious (and, as it turns out, wrong) way to understand the GetPoints() method is, “it generates a collection of points from (1, 2) to (19, 38) and returns it.” But GetPoints() doesn’t return any such thing. In fact, it doesn’t return anything but a promise–a promise to generate points later if asked. So when we say “var points = GetPoints();” what we’re actually saying is, “the points variable references some kind of points machine that will generate points when I ask for them.”

If we think of it this way, we start to get to the bottom of what’s going wrong here. On the next line, we pass this oracle into the DoubleXValue() method. The DoubleXValue() method iterates through all of the states of the points (state) machine, retrieving points as per the promise. Once it retrieves the point, it does something to the X coordinate and then promptly discards the point. Why? Because nothing else refers to it. When you change one of the points that the points machine spits out, you’re not changing anything about the points machine–you’re not feeding it some kind of new mechanism for point generation. You could think of this as being similar to a method that takes a class factory, requests a bunch of instances from it, modifies them, and then returns. Nothing about the factory is different, and you wouldn’t expect the factory to behave differently if the caller subsequently passed it to another method.

So once the DoubleXValue() method gets done doing, well, nothing of significance, the Assert() call requests the first sequential element–the first state–from the points machine. The points machine dutifully spits out its first state, (1, 2), and the unit test fails. So how do we get it to pass? Well, here’s one way:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Asdf()
{
    var points = GetPoints().ToList();
    DoubleXValue(points);
            
    Assert.AreEqual(2, points.ElementAt(0).X);
}

Notice the added ToList() call. This is very important because it means that we’re no longer storing a reference to some kind of points machine but rather to a list of points. This line now says, “Go get me a points machine, iterate through all the states of it, and store those states locally in a list.” Now, the rest of the code behaves in a way that you’re used to because you’re storing an actual, tangible collection instead of a promise to generate a sequence.

There is no shortage of posts, documents, and articles explaining the yield return state machine concept or the idea of deferred execution. I encourage you to read those to get a better understanding of the inner mechanics and usage scenarios, respectively. But hopefully this gives you a bit of practical insight that’s easy to wrap your head around into (1) why the code behaves this way and (2) why you have to be careful of providing and consuming IEnumerables. It can be tempting to get too cute with how you provide IEnumerables or too careless with how you consume them, particularly when usage and implementation are separated by inversion of control. So be aware when using IEnumerables that you may not have a list/collection, and be aware when providing them that you’re leaving it up to your clients to decide when to get and store sequence members.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

Erik Dietrich

.NET

The Pleasure of Using CodeRush
ByErik Dietrich January 12, 2011September 27, 2012

For some months now, I’ve been using CodeRush/DevExpress express (read: free) version, and I’ve just recently upgraded to the paid version. After playing with it for only a few short weeks, I’ve come to find it indispensable, so I thought I’d log a post highlighting some of my favorite features. These are relevant to C#…

Read More The Pleasure of Using CodeRush
.NET

Adding CodeRush Templates
ByErik Dietrich January 18, 2011September 27, 2012

Today I’m going to briefly describe one of the cool and slightly more advanced features of CodeRush in a little more detail. But first, a bit of background. One of the many things I found enjoyable about CodeRush is the templated shortcuts, ala VS code snippets but better. I found myself typing “tne-space” a lot…

Read More Adding CodeRush Templates
.NET

Incorporating MS Test unit tests without TFS
ByErik Dietrich January 25, 2011January 3, 2018

My impression (and somebody please correct me if I’m wrong) is that MS Test is really designed to operate in conjunction with Microsoft’s Team Foundation Server (TFS) and MS Build. That is, opting to use MS Test for unit testing when you’re using something else for version control and builds is sort of like purchasing…

Read More Incorporating MS Test unit tests without TFS
.NET

Creating a DxCore plugin
ByErik Dietrich February 10, 2011June 30, 2015

Coding standards are one of those things that generally involve some degree of compromise, as there is often a goal during collaboration to give the code a uniform look and feel. I don’t necessarily agree with this goal in all cases, but I do understand it. Having code formatted in wildly different styles in the…

Read More Creating a DxCore plugin
Language Agnostic

Static Analysis — Spell Check for Code
ByErik Dietrich February 18, 2011September 27, 2012

A lot of people have caught onto certain programming trends: some agility in the process generally makes things better, unit testing a code base tends to make it more reliable, etc. One thing that, in my experience, seems to lag behind in popularity is the use of static checking tools. If these are used at…

Read More Static Analysis — Spell Check for Code
.NET

DXCore Plugin Part 2
ByErik Dietrich February 22, 2011June 30, 2015

In the previous post on this subject, I created a basic DXCore plugin that, admittedly, had some warts. I started using it and discovered that there were subtle issues. If you’ll recall, the purpose of this was to create a plugin that would convert some simple stylistic elements between my preferences and those of a…

Read More DXCore Plugin Part 2

26 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Liens de la semaine – #15 | frenchcoding

13 years ago

[…] Développeurs C#, un petit rappel des dangers de l’exécution différée et du mot clé yield. […]

Steve Gilham

13 years ago

It’s not just yield return that does this — anything out of a LINQ expression is similarly lazily evaluated. And if your enumeration had been something stateful, like reading bytes from a stream, or a random number generator, the second evaluation would not give the same results as the first.

In general, data qualified as just IEnumerable, regardless of source, should be regarded as a read-once data structure — so transform it through LINQ to your heart’s content, but reify it as an array or a list before handing it on.

Erik Dietrich

13 years ago

Reply to Steve Gilham

Good point about the broader based applicability and yield returning something beyond the creation scope of the method. The inspiration for this particular post started out as “this is something specific that happened and here’s why,” but there are certainly more far-reaching complexities with the deferred evaluation paradigm.

dasjestyr

9 years ago

Reply to Steve Gilham

It’s implementation of the iterator pattern. LINQ is literally just a builder pattern that builds decorators which gate the result set. The IEnumerable that is returned from Where(…) is a custom iterator that wraps itself around the GetEnumerator() method of the source collection. So in the end, the logic used to deliver the next element of the collection is still defined within the source collection. I think what a method that yields actually does is return an anonymous iterator; basically the heart of an IEnumerable implementation, so as you iterate over that iterator, it just continues to deliver elements as… Read more »

Timothy Boyce

13 years ago

Deferred execution can certainly cause some problems if you aren’t careful. ReSharper is great at warning you about most cases where there could be a problem. When I pasted in your code, it warned me about possible multiple enumerations of an IEnumerable.

Erik Dietrich

13 years ago

Reply to Timothy Boyce

That’s really cool. Another piece of feature envy that I have for R#. Fingers crossed that it makes the Code Rush issues list in an upcoming release.

Michael Paterson

13 years ago

Reply to Erik Dietrich

What is the Code Rush issue?

James Curran

13 years ago

Reply to Michael Paterson

It’s the “issue list” (bug reports and feature requests) for Code Rush (Developers’ Express’s alternative to Resharper)

Toni Petrina

13 years ago

Reply to Timothy Boyce

R# pointed out immediately that the enumeration is enumerated multiple times, a general no-no 🙂

James Curran

13 years ago

The ToList() is merely a band-aid. The problem is with DoubleXValue(), which modifies that values, and then throws them away. The “correct” solution would be: var points = GetPoints(); points = DoubleXValue(points); // : // : private IEnumerable DoubleXValue(IEnumerable points) { foreach (var point in points) { point.X *= 2; yield return point; } } Alternately: private IEnumerable DoubleXValue(IEnumerable points) { return points.Select(p=> new Point {X = p.X * 2, Y = p.Y}); } or we could componentize it: private Point DoubleXValue(Point p) { return new Point { X= p.X * 2, Y = p.Y};} // : //: var points… Read more »

Erik Dietrich

13 years ago

Reply to James Curran

The ToList() call was purely instructional — to highlight the difference between storing a deferred execution enumerable as a local and storing the list resulting from walking the enumeration (I thought that would be the best way to contrast them). I definitely like your solution with the return enumeration that also uses yield return — that’s what I wound up doing in the actual code that inspired this post 🙂

Jonathan C Dickinson

13 years ago

This does have quite a bit to do with `yield`, agreed – but I think it’s also about understanding pointers correctly (pointers in C# you exclaim? Yes guys, reference types are pointers).

James Curran

13 years ago

Reply to Jonathan C Dickinson

Reference types are IMPLEMENTED AS pointers (but as is the case with all of OO design — Implementation Is Irrelevant)

Jonathan C Dickinson

13 years ago

Reply to James Curran

Actually implementation is not irrelevant, hence the reason for this blog post. A developer needs to understand that passing reference values around is passing the same piece of memory around. Making a toy OO system in plain ol’ C is a must for any developer (even if it lands up being bad, leaky and whatnot). You need to **understand** the systems that lie underneath your abstraction level, so that you don’t get bitten by issues like this one (and potentially waste time with them).

Carsten König

13 years ago

welll this is what you get if you mix “side effects” with struff from functional programming … you see: just don’t mess with this stuff (use immutable data and pure functions) and you would not run into trouble …

Erik Dietrich

13 years ago

Reply to Carsten König

Agreed. That’s the approach I take and prefer to take in reality here, myself. Unfortunately, we don’t always have complete control over the APIs and libraries that we use…. 🙁

Justin

13 years ago

Part of the problem is use of the ‘var’ keyword masking types. We are so comfortable with ‘Lists are IEnumerables’ and treating them interchangeably as such, but if you actually had to write IEnumerable as the declared type of a variable, that should immediately give you pause to think very carefully about what you’re doing.

Erik Dietrich

13 years ago

Reply to Justin

I can’t speak for anyone else, but I’m not sure if the act of typing the type (as opposed to using CodeRush to flip between explicit/implicit or hovering the mouse over var) would really have an effect on my thinking. Typing the first “Foo” in “Foo foo = GetFoo()” doesn’t really engage my brain to think of the ramifications of the type — it’s just noise. That said, if I’m reading someone else’s code (or leaving this code for someone else I suppose), I see your point — you have a better piece of self-documenting code for someone who understands… Read more »

Firehawk70

10 years ago

Reply to Justin

I agree with Justin. I know this is old, but if anyone else comes across this article, refer to Microsoft’s coding conventions regarding “var” – https://msdn.microsoft.com/en-us/library/ff926074.aspx. Your usage is not compliant with “Do not use var when the type is not apparent from the right side of the assignment.”.

I work with someone who lazily uses “var” for everything now and it’s truly annoying. It makes code harder to read because I can’t figure out what type I’m dealing with, or be able to evaluate what methods or properties might be more appropriate per the code written.

Erik Dietrich

10 years ago

Reply to Firehawk70

I won’t argue about personal readability preferences, since I’m not really in a position to do that, obviously. But I will offer a devil’s advocate argument as food for thought, using the MS coding standards you linked to. Their “don’t use var” examples are “int var4 = ExampleClass.ResultSoFar();” and “var inputInt = Console.ReadLine();” Neither of those lines is anything I would write. What if, instead, these read: var countOfCustomerRecords = ExampleClass.CustomerRecordsSoFar(); and var lineReadFromConsole = Console.ReadLine(); When writing code, I always strive to make the member names as clear as possible. Personally, I’d argue that both are easier to read… Read more »

Erik Dietrich

10 years ago

Reply to Firehawk70

As an aside, the Microsoft code example just gave me an interesting idea for a new blog post. So, thanks 🙂

Detecting IEnumerable “State Machines” | Click & Find Answer !

12 years ago

[…] I just read an interesting article called Getting too cute with c# yield return […]

Michiel Staessen

12 years ago

Working with IEnumerable and yield return can be tricky and one should indeed understand the mechaniscs of deferred execution. I experienced this yesterday. I started with .NET only a couple of months ago. I come from Java, so for me, yield return is quite “magical” in the awesome kind of way. I started playing around with it and used it in a performance test where I need to do a nested iteration of 15M and 80 entities. Running the test took very, very long (I started with a smaller number of entities) and I had no clue what was going… Read more »

Erik Dietrich

12 years ago

Reply to Michiel Staessen

Hi Michael, Thanks for reading. Like you, I came to C# from Java (and C/C++ before that), but some years ago now, back when C# current version was 2.0. My personal impression over these years has been to fall in love with C# since it seems to be identical to Java but time-warped about 2 years in the future. I believe Java just recently introduced lambdas and closures with Java 1.7, whereas C# has had these build in since 3.0 a few years ago, IIRC. Your tale does seem to serve as a good cautionary tale for transplants from other… Read more »

Michiel Staessen

12 years ago

Reply to Erik Dietrich

Using other return types than IEnumerable is indeed the best solution. It is also a more specific contract for your code. In Java, I would have never used the Collection interface (Java’s equivalent for IEnumerable) as a return type but rather used the List or Set interface. Seems like I should correct myself and start using IList and ISet instead… 🙂

What To Return: IEnumerable or IList? | DaedTech

11 years ago

[…] blogged about IEnumerable in the past and talked about how this is really a unique concept. Tl;dr version is that IEnumerable […]

Similar Posts