The Code that Cried Wolf: Proving that Comments are Unneeded

I spent a lot of years diligently commenting my code. I wrote what the code did next to the statements themselves and I wrote method headers and sometimes I even kept a journal of edits in that header as if there were no such thing as source control. As I mentioned in an earlier post, I’ve since drifted further and further from this approach to the point of almost never commenting my code if left to my own devices. The reason for this gradual but near-complete reversal is that I’ve come to view comments as a crutch that prevents code from standing on its own from a readability perspective.

For instance, consider the following class (and before we go any further, I would just like to mention that the relationship between the “Prepper” and “Employee” is not actually how I would design this — this is just an example for demonstrating a point):

public class EmployeePrepper
{
    private readonly IEnumerable<Employee> _employees;

    public EmployeePrepper(IEnumerable<Employee> employees)
    {
        _employees = employees;
    }

    public void SetupEmployees()
    {
        foreach (var employee in _employees)
        {
            if (string.IsNullOrEmpty(employee.FirstName))
                throw new Exception();
            if (string.IsNullOrEmpty(employee.LastName))
                throw new Exception();

            if (employee.LastName.Length < 3)
                employee.LoginName = employee.FirstName + employee.LastName;
            else
                employee.LoginName = employee.FirstName[0] + employee.LastName;
        }
    }
}

Imagine that you’ve asked for a code review and someone says “well, it’s a little hard to read and follow — you’ve got some magic numbers in there and it’s not all that readable.” What suggestion is coming next? Or, if none is forthcoming, what do you do to address this? Something like the next code snippet?

/// <summary>
/// This class takes a bunch of employees that the user entered and sets up derived properties on them
/// </summary>
public class EmployeePrepper
{
    /// <summary>
    /// This is a collection of employees that the constructor will pass in
    /// </summary>
    private readonly IEnumerable<Employee> _employees;

    /// <summary>
    /// This constructor takes the class's dependency and initializes it
    /// </summary>
    /// <param name="employees">The employees this class will work on</param>
    public EmployeePrepper(IEnumerable<Employee> employees)
    {
        _employees = employees;
    }

    /// <summary>
    /// Sets up the employees by setting derived properties
    /// </summary>
    public void SetupEmployees()
    {
        //Cycle through the employees
        foreach (var employee in _employees)
        {
            //Make sure a valid first name and last name are entered
            if (string.IsNullOrEmpty(employee.FirstName))
                throw new Exception();
            if (string.IsNullOrEmpty(employee.LastName))
                throw new Exception();

            //If the employee has a very short last name, then use first concatenated to last
            if (employee.LastName.Length < 3)
                employee.LoginName = employee.FirstName + employee.LastName;
            else //Otherwise, use first initial concatenated to last name
                employee.LoginName = employee.FirstName[0] + employee.LastName;
        }
    }
}

Ah, much better, right? From the outside looking in, with IntelliSense, I learn that (1) the class takes a collection of employees and sets their derived properties, (2) the constructor takes a dependency and initializes itself, and (3) the setup employee method sets up the employees with their derived properties. Of course, I could learn that by (1) inspecting the constructor, (2) common sense, and (3) see item (1).

Okay, so maybe I don’t learn all that much of interest from IntelliSense, but there’s a lot of clarity in the method itself. I learn that (1) I’m going to cycle through records, (2) I’m going to check properties for validity and (3) a descriptive business rule for setting up login names. Of course, I could learn (1) by knowing what foreach() is and (2) by basic inspection. (3) is interesting in that it’s the first comment that isn’t just pointless filler. It actually explains something that I need to know.

So 1 out of 6 comments is helpful, but isn’t that still a net gain? Well, it is if we assume that there is no downside to having commented code. But there is a downside. Specifically, the addition of comments to the file means extra artifact that must be maintained or else the comments drift from noise to lies. For example, what if we add a constructor parameter and don’t pay attention to the doc comments (which pretty much no one does when editing classes)? Suddenly that comment about taking a single employee dependency is nonsensical. So you have 5 liabilities and 1 asset in terms of comments here, which seems like a poor tradeoff (although I’m not going to attempt to place an actual measurable value on the concept of misleading comments versus helpful, explanatory comments — even if I could, it wouldn’t matter much since it’s pretty unlikely I’ll read your comments anyway :) ).

Okay, I’ve sold you on my premise, but what do you say to the architects/leads/coworkers/project managers/etc who aren’t buying what I’m selling? How do you convince them that this activity is somewhere between pointless and detrimental? Well, prove to them that the comments are pointless. Take away the crutch and make your class names, method names, variable names, and logic flow so clean and so self documenting that the comments look preposterous. Take the original code and turn it into this:

public class EmployeeDerivedPropertySetter
{
    private const int LastNameMinimumLength = 3;

    public class FirstNameEmptyException : Exception { }

    public class LastNameEmptyException : Exception { }

    private readonly IEnumerable<Employee> _employeesPassedInFromConstructor;

    public EmployeeDerivedPropertySetter(IEnumerable<Employee> employees)
    {
        _employeesPassedInFromConstructor = employees;
    }

    public void SetupDerivedProperties()
    {
        foreach (var employee in _employeesPassedInFromConstructor)
            SetupDerivedProperties(employee);
    }

    private static void SetupDerivedProperties(Employee employee)
    {
        ValidateFirstAndLastName(employee);
        SetLoginName(employee);
    }

    private static void ValidateFirstAndLastName(Employee employee)
    {
        if (string.IsNullOrEmpty(employee.FirstName))
            throw new FirstNameEmptyException();
        if (string.IsNullOrEmpty(employee.LastName))
            throw new LastNameEmptyException();
    }

    private static void SetLoginName(Employee employee)
    {
        if (IsEmployeeLastNameVeryShort(employee))
            employee.LoginName = ConcatenateFirstAndLastNames(employee);
        else
            employee.LoginName = ConcatenateFirstInitialAndLast(employee);
    }

    private static bool IsEmployeeLastNameVeryShort(Employee employee)
    {
        return employee.LastName.Length < LastNameMinimumLength;
    }

    private static string ConcatenateFirstAndLastNames(Employee employee)
    {
        return employee.FirstName + employee.LastName;
    }

    private static string ConcatenateFirstInitialAndLast(Employee employee)
    {
        return employee.FirstName.ElementAt(0) + employee.LastName;
    }
}

Now, small factored methods might not be your thing, particularly if you love debugger (or just love big methods and classes for some reason). But I think you’d be hard pressed to argue that this isn’t quite readable and self explanatory. There’s no method here that you’d look at, then squint at, then rub your head and say, “what the…?” You have things like “If the last name is very short, set login name to first concatenated to last.” This class reads like an instruction manual or… kinda like the version with a bunch of comments. Except unlike that version, your code is actually the comments, so it’s impossible for it to become obsolete and misleading.

But I’m not trying to tell you how to code — I’m trying to tell you how to sell this. I’m trying to tell you how to prove that comments are unnecessary. “How,” you ask? Well, put the comments back in and see how completely obtuse they now look:

public class EmployeeDerivedPropertySetter
{
    /// <summary>
    /// The minimum length of the last name
    /// </summary>
    private const int LastNameMinimumLength = 3;

    /// <summary>
    /// Thrown when the first name is empty
    /// </summary>
    public class FirstNameEmptyException : Exception { }

    /// <summary>
    /// Thrown when the last name is empty
    /// </summary>
    public class LastNameEmptyException : Exception { }
    /// <summary>
    /// This is a collection of employees that the constructor will pass in
    /// </summary>
    private readonly IEnumerable<Employee> _employeesPassedInFromConstructor;

    /// <summary>
    /// This constructor takes the class's dependency and initializes it
    /// </summary>
    /// <param name="employees">The employees this class will work on</param>
    public EmployeeDerivedPropertySetter(IEnumerable<Employee> employees)
    {
        //Set the employee variable to the passed in value
        _employeesPassedInFromConstructor = employees;
    }

    /// <summary>
    /// Sets up the employees by setting derived properties
    /// </summary>
    public void SetupDerivedProperties()
    {
        //Cycle through the employees
        foreach (var employee in _employeesPassedInFromConstructor)
            SetupDerivedProperties(employee);
    }

    /// <summary>
    /// Setup the derived properties
    /// </summary>
    /// <param name="employee">Employee to set them up for</param>
    private static void SetupDerivedProperties(Employee employee)
    {
        //Make sure a valid first name and last name are entered
        ValidateFirstAndLastName(employee);
        //Set the employee's login name
        SetLoginName(employee);
    }

    /// <summary>
    /// Validate the first and last name of the employee
    /// </summary>
    /// <param name="employee">The employee to validate</param>
    private static void ValidateFirstAndLastName(Employee employee)
    {
        //If first name is null or empty throw an exception
        if (string.IsNullOrEmpty(employee.FirstName))
            throw new FirstNameEmptyException();
        //If last name is null or empty, throw an exception
        if (string.IsNullOrEmpty(employee.LastName))
            throw new LastNameEmptyException();
    }

    /// <summary>
    /// Set the employee's login name
    /// </summary>
    /// <param name="employee">Employee whose login name to set</param>
    private static void SetLoginName(Employee employee)
    {
        //If the employee has a very short last name, then user first concatenated to last
        if (IsEmployeeLastNameVeryShort(employee))
            employee.LoginName = ConcatenateFirstAndLastNames(employee);
        else //Otherwise, use first initial concatenated to last name
            employee.LoginName = ConcatenateFirstInitialAndLast(employee);
    }

    /// <summary>
    /// Returns true if the employee last name is very short
    /// </summary>
    /// <param name="employee">Employee to check for a short last name</param>
    /// <returns>True if the last name is very short, false if not</returns>
    private static bool IsEmployeeLastNameVeryShort(Employee employee)
    {
        //Returns whether or not the last name's length is shorter than the minimum length of the last name
        return employee.LastName.Length < LastNameMinimumLength;
    }

    /// <summary>
    /// Concatenate an employee's first and last name
    /// </summary>
    /// <param name="employee">Employee whose first and last name to calculate</param>
    /// <returns>The concatenated first and last name</returns>
    private static string ConcatenateFirstAndLastNames(Employee employee)
    {
        //Concatenate the first and last names of the employee
        return employee.FirstName + employee.LastName;
    }

    /// <summary>
    /// Concatenates the first initial and last name of the employee
    /// </summary>
    /// <param name="employee">Employee whose first initial and last name to concatenate</param>
    /// <returns>The concatenated first initial and last name</returns>
    private static string ConcatenateFirstInitialAndLast(Employee employee)
    {
        //Concatenate the first initial and last name of the employee
        return employee.FirstName.ElementAt(0) + employee.LastName;
    }
}

I think you’d be hard pressed to find someone who thought that this level of commenting on this self-describing code made any sense. That’s the beauty of this demonstration — more even than just the self documenting code, adding comments next to it demonstrate that there’s really nothing left to say. And hey, if this gambit backfires and you find someone who thinks this is a good idea, you’ve learned a valuable piece of information about someone to bear in mind when considering which projects to work on and where to work in the future. In all seriousness, though, I think this will do a fairly good job of demonstrating that you can express the vast majority of what you would express in comments with the code itself.

There is another very practical benefit to proving this and the resulting Spartan approach to comments beyond the fact that the comments and code tend not to get out of sync; when comments are a rarity, you’re more likely to read them. Think of heavily commented code as “The Code that Cried Wolf.” If you see a lot of code that looks like the code in this snippet immediately above, don’t you start to just kind of tune out the comments? They appear in some other color than the main code, and rather than information, they become comforting white noise, separating your methods and fields from one another the way a little bit of margin/padding separates a bunch of text boxes on the form from one another. You don’t study the buffering space — you just take it for granted. This is how it works with over-commented code bases. If you find yourself in code bases like this, do an exercise and ask yourself when the last time you paid attention to a comment was. Yesterday? Last week? Last month? Now ask yourself how many times you’ve looked at the code since then? A hundred? A thousand? Ten thousand? That’s a pretty low percentage of time that you pay attention to comments.

But imagine a code base with almost no comments. You’re flipping between classes and looking at methods, and there’s no extras — just code. Now, you open some class and there’s a big, fat, 10 line comment. Betcha notice it immediately and, what’s more, betcha read it. And, I’m also willing to wager it’s worth reading. If a code base has an absence of noise comments, the developer that wrote this one probably has something important to say: why he chose to do it this way even though it seems obtuse or where she found a template/inspiration for this code via google and why she chose to use it.

Method, parameter, class, and variable names can tell you a lot about what the code elements are. Source control checkin comments can tell you a lot about why various changes were made. Unit tests/BDD Tests/Integration tests can tell you a lot about business rules and requirements. None of these things require adding redundancy to the code itself. Comments in your code should be the exception rather than the rule. And you can prove it.

  • Pingback: How to Make Your Code More Readable | DaedTech

  • Vince

    Comments avoid the need to inspect code and save time. Comments also let you browse the API via intellisense.

    Take a look at the. Net framework as an example. I betcha spent lots of time reading the intelisense method and parameter comments no? That’s quality software.

    If you think comments are a waste of time then you need to invest more time in to your typing speed. Dvelopers who can type quickly don’t see comments as a chore because for them it’s a fraction of a second to type it up. To me It sounds like a cop-out for being lazy.

  • http://www.daedtech.com/blog Erik Dietrich

    If the reason a developer adds XDoc comments is because he or she can type really fast so they don’t take much time, I have a hard time imaging that those are high quality comments that make the difference between understanding and lack thereof. And if you’re going to go that route, here’s a tool you’ll probably really like: http://submain.com/products/ghostdoc.aspx As you can see, that’s instant effortless XDoc comments for your entire API, including a help file.

    My motivation for not commenting is thus neither laziness nor lack of typing speed since XDoc comments require no typing or time. It really, honestly is that I have come to view comments (especially the kind you crank out in seconds because you type fast) as a failure to be clear in your API and a crutch. I explain my take on this more here: http://www.daedtech.com/comments-heres-my-code-and-im-sorry

    Long and short of it is this — I read .NET framework comments only when the API doesn’t make itself clear. I’ve never read the comments for Console.WriteLine() or other good abstractions. Comments are where you look when the API fails to be clear on its own.

  • Vince

    String.Trim() as obvious as the method name is, is still commented. Why do you think that is?

  • Vince

    The last comment I’ll make on this is that you’ve been working with the wrong developers. Some people write mind numbing comments which don’t serve any purpose and others who dont update the comments when they change something. They are amateurs in my opinion.

    Take a look at any professional framework out there, hibernate, ninject, log4j, log4net, etc… they are all commented and consistent and provide useful info when browsing the API.

  • http://www.daedtech.com/blog Erik Dietrich

    My guess is that there was a policy in place mandating XDoc comments ;)

    In all seriousness, the case where this gets murky is the on that you allude to where you’re writing actual frameworks where people reading your comments are your customers. If customers want documentation and are willing to pay for it (or use it, in the case of open source) then it’s less an issue of good coding practice and more an issue of delivering a requested feature/service to a paying customer. And I can’t possibly argue with doing that — if you’re paying me to write you a library and you want that library documented, you’ll get XDoc comments and a help file with coding samples in it from me, if that’s what you want and want to pay for.

    I’m talking about the far more common case (I suspect) in the development world of writing internal applications, public facing websites, line of business apps, etc. These are scenarios where you have no public facing API, and the comments become a manner of communicating with maintenance programmers or your teammates. And that’s where I’ve seen that, time and again, they become a crutch and eventually they become inaccurate and misleading. And when companies like this double down on a mandatory comments policy, the comments seem to rot even faster as the developers resent them in the first place.

  • Vince

    I see. Point taken :)

    By the way I’m having a great time going through all your blog posts, it’s a great resource challenging my practices.

  • http://www.daedtech.com/blog Erik Dietrich

    Glad if you’re enjoying them — and I appreciate the comments. Getting other people’s take on what I have to say provides the same service you’re talking about — challenging my own beliefs.