DaedTech

Stories about Software

By

Getting Too Cute with C# Yield Return

I ran across a method that returned an IEnumerable<T> recently, and I implicitly typed its return value. During the course of a series of method extractions, code movement, and general refactoring, I wound up with some code that passed the various unit tests in place but failed curiously at runtime. After peering at it for a few minutes and going through once in the debugger, I traced it to a problem that you don’t see every day, and one that probably would have had me tearing my hair out if I didn’t have a good working understanding of what the “yield” keyword in C# does. So today, I’ll present the essence of this problem in the hopes that, if you weren’t aware of it, you are now.

CuteYieldReturn

Here is an entire class that contains a nested type and a couple of methods, for illustration purposes. At the bottom is a unit test that will, if you copy this into your scratchpad, fail.

public class MiscTest
{
    public class Point
    {
        public int X { get; set; }
        public int Y { get; set; }
    }

    private IEnumerable GetPoints()
    {
        for (int index = 1; index < 20; index++)
            yield return new Point() { X = index, Y = index * 2 };
    }

    private void DoubleXValue(IEnumerable points)
    {
        foreach (var point in points)
            point.X *= 2;
    }

    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void Asdf()
    {
        var points = GetPoints();
        DoubleXValue(points);
            
        Assert.AreEqual(2, points.ElementAt(0).X);
    }
}

It seems pretty straightforward. You have some method that returns a bunch of points, and then you take those points and pass them to a method that iterates through them, performing an operation on each one. So what gives? Why does this fail? Everything looks pretty simple (unlike my situation, where this became removed through a few layers of indirection), and yet we get back 1 when we’re expecting 2.

To understand this, it’s important to understand what yield actually does. At its core, the yield keyword is syntactic sugar that tells the compiler to generate a state machine under the hood. Let that sink in for a moment, because it’s actually kind of a wild concept. You’re used to methods that return references to object instances or primitives or collections, but this is something fundamentally different. A method that returns an IEnumerable and does so using yield return isn’t defining a return value–it’s defining a protocol for interacting with client code.

Consider the code example above. The obvious (and, as it turns out, wrong) way to understand the GetPoints() method is, “it generates a collection of points from (1, 2) to (19, 38) and returns it.” But GetPoints() doesn’t return any such thing. In fact, it doesn’t return anything but a promise–a promise to generate points later if asked. So when we say “var points = GetPoints();” what we’re actually saying is, “the points variable references some kind of points machine that will generate points when I ask for them.”

If we think of it this way, we start to get to the bottom of what’s going wrong here. On the next line, we pass this oracle into the DoubleXValue() method. The DoubleXValue() method iterates through all of the states of the points (state) machine, retrieving points as per the promise. Once it retrieves the point, it does something to the X coordinate and then promptly discards the point. Why? Because nothing else refers to it. When you change one of the points that the points machine spits out, you’re not changing anything about the points machine–you’re not feeding it some kind of new mechanism for point generation. You could think of this as being similar to a method that takes a class factory, requests a bunch of instances from it, modifies them, and then returns. Nothing about the factory is different, and you wouldn’t expect the factory to behave differently if the caller subsequently passed it to another method.

So once the DoubleXValue() method gets done doing, well, nothing of significance, the Assert() call requests the first sequential element–the first state–from the points machine. The points machine dutifully spits out its first state, (1, 2), and the unit test fails. So how do we get it to pass? Well, here’s one way:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Asdf()
{
    var points = GetPoints().ToList();
    DoubleXValue(points);
            
    Assert.AreEqual(2, points.ElementAt(0).X);
}

Notice the added ToList() call. This is very important because it means that we’re no longer storing a reference to some kind of points machine but rather to a list of points. This line now says, “Go get me a points machine, iterate through all the states of it, and store those states locally in a list.” Now, the rest of the code behaves in a way that you’re used to because you’re storing an actual, tangible collection instead of a promise to generate a sequence.

There is no shortage of posts, documents, and articles explaining the yield return state machine concept or the idea of deferred execution. I encourage you to read those to get a better understanding of the inner mechanics and usage scenarios, respectively. But hopefully this gives you a bit of practical insight that’s easy to wrap your head around into (1) why the code behaves this way and (2) why you have to be careful of providing and consuming IEnumerables. It can be tempting to get too cute with how you provide IEnumerables or too careless with how you consume them, particularly when usage and implementation are separated by inversion of control. So be aware when using IEnumerables that you may not have a list/collection, and be aware when providing them that you’re leaving it up to your clients to decide when to get and store sequence members.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Complain-Bragging Your Way to the Top

First, I’d like to make a brief aside to re-introduce Amanda Muledy (@eekitsabug) who, in addition to editing and writing, is also an artist.  Lest you think that I have a single iota of drawing talent, any sketches that you see as post illustrations are her work, done specifically for the Daedtech blog.  Her drawings are certainly better than me scrounging images from Wikimedia Commons public domain, so I’m going to be using those as graphics whenever possible.

Mo Money, Mo Problems

You know what I hate? I hate it when I get caviar stains all over the leather in my brand-new, fully-loaded Porsche SUV. It’s even worse when it happens because I was distracted by all of the people emailing me to ask if I would review their code or be part of their startup or maybe just for my autograph. That really sucks.

FatCatOkay, so maybe none of those things applies to me, but you know what does? Being party to people couching shameless bragging and self promotion as complaints and impositions. I hate being rich and popular because blah-blah-blah-whatever-this-part-doesn’t-matter. In case you missed it, I’m rich and I’m popular.

Surely you know someone that engages in this sort of thing. It’d be nearly impossible not to because I think just about everyone does it from time to time and to some degree or another. Most people have the interpersonal acumen to realize that simply blurting out flattering facts about oneself is somewhat distasteful. However, finding some other context in which to mention those same facts is likely to alleviate some of that distaste.  And complaining somehow seems like an easy and immediate pretext for this alternate context. (The best way would probably be to get some sympathetic plant to lob softballs at you, or else to simply wait for someone else to mention your achievements, but those are elaborate and unreliable, respectively.)

When people complain-brag, it’s most naturally in regards to some subject with which both speaker and audience are familiar. In the weightroom, a weightlifter may complain-brag about how sore he is from bench-pressing 320 pounds yesterday. In the college world, a kid may complain-brag about how badly hungover he is from pounding twenty-eight beers last night. Some subjects, such as money, popularity, and achievement, are near universal, while others, like the examples in this paragraph, are domain specific. Another domain-specific and curious example of complain-bragging that I observe quite frequently is corporate complain-bragging (or, its more innocuous cousin, excuse-bragging).

Mo Meetings, Mo Problems

Think about the following phrases you probably hear (or utter) with regularity:

  1. “Man, it’s so hard to keep up with all of my email” == “A lot of people want or need to talk to me.”
  2. “Ugh, I’ve had so many meetings today that I haven’t been able to get a thing done.” == “I’m important enough to get included on a lot of meeting rosters.”
  3. “Sorry I’m late, but my 10:00 ran way over” == “I’m not really sorry that I’m late and I want you to know that I’m important enough to have back-to-back meetings and to keep you waiting.”
  4. “I have so many code review requests that I won’t get to write code for weeks!” == “I’m the gatekeeper, Plebe, and don’t you forget it.”

There’s an interesting power dynamic in the corporate world that came of age during the heyday of command-and-control style management via intimidation. That power dynamic is one in which managers sit in the seats of power and makers work at their behest. (For a primer on the manager/maker terminology and background on how they interact, see this wonderful post by Paul Graham.) Makers are the peons who work all day, contributing to the bottom line: factory workers, engineers, programmers, accountants… even salesmen, to an extent. Managers are the overhead personnel that supervise them. Makers spend their days making things and managers spend their days walking from one meeting room to the next, greasing the skids of communication, managing egos, adjudicating personnel decisions, and navigating company realpolitik minefields. Managers also make money, get offices, and hear “how high” when they tell the peons to jump.

It’s traditionally an enviable position, being manager, which leads many people to envy it.  And, in true fake-it-till-you-make-it style, this envy leads those same people to emulate it. So aspiring mid-level managers begin to act the part: dressing more sharply; saying things like, “synergy,” and, “proactive”; bossing other people around; and making themselves seem more manager-like, in general. And what do managers do throughout the day? Well, they shuffle around from meeting to meeting, constantly running late and having way too many emails to read. So what does someone wanting to look like a manager without looking like they’re trying to look like a manager do? Why, they complain-brag (and excuse-brag) in such a way as to mimic mid-level management. In effect, this complain-bragging, tardiness, and unresponsiveness becomes a kind of status currency within the organization with actual power capital being exchanged in games of chicken. Who can show up a few minutes late to meetings or blow off an email here and there? (Think I’m being cynical? Take a look around when your next meeting is starting late, and I’ll wager dollars to donuts that people show up in rough order of tenure/power/seniority, from most peon to most important.)

Rethinking the Goals

So what’s my point with all this? Believe it or not, this isn’t a tone-deaf or bitter diatribe against “pointy haired bosses” or the necessity of office politics (which actually fascinates me as a subject). A lot of managers, executives, and people in general aren’t complain-bragging at all when they assess situations, offer excuses, or make apologies. But intentional or not, managers and corporate power brokers over the course of decades have created a culture in which complain-bragging about busy-ness, multi-tasking, and “firefighting” are as common and culturally expected as inane conversations about the weather around the water cooler. Rather than complaining about this reality, I’m encouraging myself and anyone reading to avoid getting caught up in it quite so automatically.

Why avoid it? Well, because I’d argue that obligatory complain-bragging creates a mild culture of failure. Let’s revisit the bullet points from the previous section and consider them under the harsh light of process optimization:

  1. “Man, it’s so hard to keep up with all of my email” — If you’re getting that many emails, you’re probably failing to delegate and thus failing as a manager.
  2. “Ugh, I’ve had so many meetings today that I haven’t been able to get a thing done.” — You’re not getting anything done and business continues anyway, “so what is it you’d say… ya do here?”
  3. “Sorry I’m late, but my 10:00 ran way over” — You either don’t have control over your meetings or don’t have a clear agenda for them, so your 10:00 was probably a waste of time.
  4. “I have so many code review requests that I won’t get to write code for weeks!” — You’re a bottleneck and costing the company money. Fix this.

You see, the problem with corporate complain-bragging or complain-bragging in general is that it necessarily involves complaining, which means ceding control over a situation. After all, you don’t typically complain about situations for which you called all the shots and got the desired outcome. And sometimes these are situations that you ought to control or situations that you’d at least look better for controlling. So I’ll leave you with the following thought: next time you find yourself about to complain-brag according to the standard corporate script, make it a simple apology, a suggestion for improvement or, perhaps best of all, a silent vow to fix something. Do these, and other people might do your bragging for you.

By

JUnit Revisited

Just as a warning, in this short post, I’m going to be writing unit tests that verify that primitives in Java do what they should and basically that gravity is still turned on. The reason for that is that I’d like to showcase some new Java unit testing goodies I’ve recently discovered since coming back into the Java fold a little here and there lately. I firmly believe that the more conversationally readable the contents of unit tests are, the more effective they will be at defining functional and internal requirements as well as showcasing the behavior of the system.

@Test
public void two_ints_are_equal() {
int x = 4;
int y = 4;
assertThat(x, is(y));
}

Coming from the .NET world and using MSTest, I’m used to semantics of Assert.AreEqual<int>(x, y) where, by convention, the “control” or expected value goes on the left and the actual value goes on the right. This is a compelling alternative in that it reads like a sentence, which is always good. The MSTest version reads “Are equal x and y” whereas this reads “x is y.” The less it reads like Yoda is talking, the better. So what enables this goodness?

import static org.junit.Assert.assertThat;
import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.CoreMatchers.not;
import static org.junit.matchers.JUnitMatchers.*;

The first import gives you assertThat(), obviously. assertThat() as shown above takes two parameters (there is an overload that takes a string as an additional parameter to let you specify a failure message): a generic type for the first parameter, and a “matcher” for the second parameter. Matchers perform evaluations on types and can be chained together in fluent fashion to allow construction of sentences that flow. For instance, you can chain the is() matcher and the not() matcher to get the following test:

@Test
public void two_ints_are_not_equal() {
int x = 4;
int y = 5;
assertThat(x, is(not(y)));
}

This really just scratches the surface and there are lots of additional matchers from hamcrest as well. You can even extend the functionality by defining your own matchers to cater to the ubiquitous language of the domain that you’re using. This just barely scratches the surface, but if you’re a java developer and haven’t given these a look, I’d suggest doing so. If you’re a .NET developer, it’s worth taking a peek at what’s going on elsewhere and perhaps defining your own such constructs if you’re feeling ambitious or looking for existing ones. In fact, if you know of good ones, please post ’em — I always like seeing what’s out there.

By

Programatically Filling Out PDFs in Java

I just got done dealing with an interesting problem. I had one of those PDFs that’s a form you can fill out and was tasked with programatically filling it out. So, I busted out my google-fu and came across PDFBox. It’s a handy and fairly no-nonsense little utility not just for filling out forms, but for manipulating PDFs in general. I had no idea something like this existed (mainly because I’d never really thought about it).

I downloaded the jar for PDFBox and wrote a simple class to test out my theory. In setting up the class and poking around randomly in the documentation, I saw that the main object of interest was a PDDocument. So, I set about instantiating one and discovered that you needed to use something called a COSDocument, which took something called a RandomAccess (not the standard version of the file, but a special version from PDFBox), and then my eyes started to cross and I pulled back and discovered that this is really what I want:

private static PDDocument _pdfDocument;

private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
	_pdfDocument = PDDocument.load(originalPdf);
	
	_pdfDocument.save(targetPdf);
	_pdfDocument.close();
}

Much easier. Now, as I got down to business of trying this out, I discovered via runtime exception that I needed two external dependencies. One was apache commons logging and the other was something called fontbox that was right there along with the PDFBox download, but I ignored in the beginning. Probably with this code alone you wouldn’t necessarily hit both of those problems, but you will eventually, so better to add those jars right up front.

So far I was able successfully to open a PDF and save it as another file, which isn’t exactly a new capability for any programming language with file I/O, so I added something a little more concrete to the mix:

private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
	_pdfDocument = PDDocument.load(originalPdf);
	
	_pdfDocument.getNumberOfPages();
	
	setField("SomeFieldName", "SomeFieldValue");
	_pdfDocument.save(targetPdf);
	_pdfDocument.close();
}

Lo and behold, liftoff. I actually got the right number of pages in the document. Now I was getting somewhere. Time to get down to the real business. I did this by going to the “Cookbook” section of the project and seeing what was under form generation. Seeing that this just took me to the javadoc for examples, I went and downloaded the example code and pasted it into my project (modifying it to conform to the Egyptian-style braces. In this fashion, I had a method that would print out all of the fields in the PDF as well as a method that would let me set fields by name. When I ran the one that printed out all of the fields, I got a runtime exception about some deprecated method and I discovered that in the source code for that method, it just threw an exception. Presumably, the written examples predated some change that had deprecated that method — deprecated it with extreme prejudice!

Well, I’d like to say that I fought the good fight, but I didn’t. I just deleted the offending call since it was just writing to console. So here is the end result of that effort:

public class Populater {

	private static PDDocument _pdfDocument;
	
	public static void main(String[] args) {
		
		String originalPdf = "C:\\blah\\blah\\input.PDF";
		String targetPdf = "C:\\blah\\blah\\input.PDF";
		
		try {
			populateAndCopy(originalPdf, targetPdf);
		} catch (IOException | COSVisitorException e) {
			e.printStackTrace();
		}
		
		System.out.println("Complete");
	}

	private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
		_pdfDocument = PDDocument.load(originalPdf);
		
		_pdfDocument.getNumberOfPages();
		//printFields();  //Uncomment to see the fields in this document in console
		
		setField("SomeFieldName", "SomeFieldValue");
		_pdfDocument.save(targetPdf);
		_pdfDocument.close();
	}
	
    public static void setField(String name, String value ) throws IOException {
        PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
        PDAcroForm acroForm = docCatalog.getAcroForm();
        PDField field = acroForm.getField( name );
        if( field != null ) {
            field.setValue(value);
        }
        else {
            System.err.println( "No field found with name:" + name );
        }
    }

    @SuppressWarnings("rawtypes")
	public static void printFields() throws IOException {
        PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
        PDAcroForm acroForm = docCatalog.getAcroForm();
        List fields = acroForm.getFields();
        Iterator fieldsIter = fields.iterator();

        System.out.println(new Integer(fields.size()).toString() + " top-level fields were found on the form");

        while( fieldsIter.hasNext()) {
            PDField field = (PDField)fieldsIter.next();
               processField(field, "|--", field.getPartialName());
        }
    }
    
    @SuppressWarnings("rawtypes")
	private static void processField(PDField field, String sLevel, String sParent) throws IOException
    {
        List kids = field.getKids();
        if(kids != null) {
            Iterator kidsIter = kids.iterator();
            if(!sParent.equals(field.getPartialName())) {
               sParent = sParent + "." + field.getPartialName();
            }
            
            System.out.println(sLevel + sParent);
            
            while(kidsIter.hasNext()) {
               Object pdfObj = kidsIter.next();
               if(pdfObj instanceof PDField) {
                   PDField kid = (PDField)pdfObj;
                   processField(kid, "|  " + sLevel, sParent);
               }
            }
         }
         else {
             String outputString = sLevel + sParent + "." + field.getPartialName() + ",  type=" + field.getClass().getName();
             System.out.println(outputString);
         }
    }
}

Going forward, I’ll certainly factor this into a new class and probably extract some methods and improve warning avoidance, but that’s the gist of it. This didn’t exactly take a long time, but it probably could have gone quicker if I’d known a little more up-front and had all example code in one place. Hopefully it helps you in that capacity.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Betrayed by Your Test Runner

The Halcyon Days of Yore

I was writing some code for Apex this morning and I had a strange sense of deja vu. Without going into painful details, Salesforce is a cloud-based CRM solution and Apex is its proprietary programming language that developers can use to customize their applications. The language is sort of reminiscent of a stripped-down hybrid of Java and C#. The IDE you use to develop this code is Eclipse, armed with an Apex plugin.

The deja vu that I experienced transported me back to my college days working in a 200 oldschoollinuxlevel computer systems course where projects assigned to us were the kind of deal involving profs/TAs writing 95% of the code and we filled in the other 5%. I am always grateful to my alma mater for this since one of the things most lacking in university CS education is often concepts like integration and working on large systems. In this particular class, I was writing C code in pico and using a makefile to handle compiling and linking the code on a remote server. This generally took a while because of network latency, server business, it being 13 years ago, a lot of files to link, etc. The end result was that I would generally write a lot of code, run make, and then get up and stretch my legs or get a drink or something, returning later to see what had happened.

This is what developing in Apex reminds me of. But there’s an interesting characteristic of Apex, which is that you have to write unit tests, they have to pass, and they have to cover something like 70% of your code before you’re allowed to run it on their hardware in production. How awesome is that? Don’t you sometimes wish C# or Java enforced that on your coworkers that steal in like ninjas and break half the things in your code base with their checkins? I was pumped when I got this assignment and set about doing TDD, which I’ve done the whole time. I don’t actually know what the minimum coverage is because I’ve been at 100% the entire time.

A Mixed Blessing?

One of the first things that I thought while spacing out and waiting for compile was how much it reminded me of my undergrad days. The second thing I thought of, ruefully, was how much better served I would have been back then to know about unit tests or TDD. I bet that could have same me some maddening debugging sessions. But then again, would I have been better off doing TDD then? And, more interestingly, am I better off doing it now?

Anyone who follows this blog will probably think I’ve flipped my lid and done a sudden 180 on the subject, but that’s not really the case. Consider the following properties of Apex development:

  1. Sometimes when you save, the IDE hangs because files have to go back to the server.
  2. Depending on server load, compile may take a fraction of a second or up to a minute.
  3. It is possible for the source you’re looking at to get out of sync with the feedback from the compiling/testing.
  4. Tests in a class often take minutes to run.
  5. Your whole test suite often takes many, many minutes to run.
  6. Presumably due to server load balancing, network latency and other such factors, feedback time appears entirely non-deterministic.
  7. It’s normal for me to need to close Eclipse via task manager and try again later.

Effective TDD has a goal of producing clean code that clearly meets requirements at the unit level, but it demands certain things of the developer and the development environment.  It is not effective when the feedback loop is extremely slow (or worse, inaccurate) since TDD, by its nature, requires near constant execution of unit tests and for those unit tests to be dependable.

Absent that basic requirement, the TDD practitioner is faced with a conundrum.  Do you stick to the practice where you have red (wait 2 minutes), green (what was I doing again, oh yeah, wait 3 minutes), refactor (oops, I was reading reddit and forgot what I was doing)?  Or do you give yourself larger chunks of time without feedback so that you aren’t interrupted and thrown out of the flow as often?

My advice would be to add “none of the above” to the survey and figure out how to make the feedback loop tighter.  Perhaps, in this case, one might investigate a way to compile/test offline, alter the design, or to optimize somehow.  Perhaps one might even consider a different technology.  I’d rather switch techs than switch away from TDD, myself.  But in the end, if none of these things proves tenable, you might be stuck taking an approach more like one from 20+ years ago: spend a big chunk of time writing code, run it, write down everything that went wrong, and trying again.  I’ll call this RDD — restriction driven development.  I’d say it’s to be avoided at all costs.

I give force.com an A for effort and concept in demanding quality from developers, but I’d definitely have to dock them for the implementation since they create a feedback loop that actively discourages the same.  I’ve got my fingers crossed that as they expand and improve the platform, this will be fixed.