DaedTech

Stories about Software

By

Your Code Is Data

This is a post that I originally wrote for the NDepend blog. If you haven’t already, go check it out! We’re building out some good content over there around static analysis, with lots more to follow.

A lot of programmers have some idea of what static analysis is, as least superficially.  If I mention the term, what pops into your head?  Automatic enforcement of coding standards?  StyleCop or FXCop?  Cyclomatic complexity and Visual Studio’s “maintainability index?”  Maybe you’re deeply familiar with all of the subtleties and nuances of the technique.

Whatever your level of familiarity, I’d like to throw what might be a bit of a curve ball at you.  Static analysis is the idea of analyzing source code and byte code for various properties and reporting on those properties, but it’s also, philosophically, the idea of treating code as data.  This is deeply weird to us as application developers, since we’re very much used to thinking of source code as instructions, procedures, and algorithms.  But it’s also deeply powerful.

ComputerInACage

When you think of source code this way, typical static analysis use cases make sense.  FXCop asks questions along the lines of “How many private fields not prepended with underscores,” or, perhaps, “SELECT COUNT(class_field) FROM classes WHERE class_field NOT LIKE ‘_*’”  More design-focused source code analysis tools ask questions like “What is the cyclomatic complexity of my methods,” or, perhaps, “SELECT cyclomatic_complexity FROM Methods.”

But if code is data, and static analysis tools are sets of queries against that data, doesn’t it seem strange that we can’t put together and execute ad-hoc queries the way that you would with a relational (or other) database?  I mean, imagine if you built out some persistence store using SQL Server, and the only queries you were allowed were SELECT * from the various tables and a handful of others.  Anything beyond that, and you would have to inspect the data manually and make notes by hand.  That would seem arbitrarily and even criminally restrictive.  So why doesn’t it seem that way with our source code?  Why are we content not having the ability to execute arbitrary queries?

I say “we” but the reality is that I can’t include myself in that question, since I have that ability and I would consider having it taken away from me to be crippling.  My background is that of a software architect, but beyond that, I’m also a software craftsmanship coach, teacher, and frequent analyzer of codebases in a professional capacity, auditing a wide variety of them for various properties, characteristics, and trends.  If I couldn’t perform ad-hoc, situation-dependent queries against the source code, I would be far less effective in these roles.

My tools of choice for doing this are NDepend and its cousin JArchitect (for Java code bases).  Out of the box, they’re standard static analysis and architecture tools, but they also offer this incredibly powerful concept called CQLinq that is, for all intents and purposes, SQL for the ‘schema’ of source code.  In reality, CQLinq is actually a Linq provider for writing declarative code queries, but anyone that knows SQL (or functional programming or lamba expressions) will feel quite at home creating queries.

Let’s say, for instance, that you’re the architect for a C# code base and you notice a disturbing trend wherein the developers have taken to communicating between classes using global variables.  What course of action would you take to nip this in the bud?  I bet it would be something annoying for both you and them.  Perhaps you’d set a policy for a while where you audited literally every commit and read through to make sure they weren’t doing it.  Maybe you’d be too pressed for time and you’d appoint designated globals cops.  Or, perhaps you’d just send out a lot of angry, threatening emails?

Do you know what I would do?  I’d just write a single CQLinq query and add it to a step in my automated team build that executed static analysis code rules against all commits.  If the count of global variable invocations in the code base was greater after the commit than before it, the build would fail.  No need for anger, emails or time wasted checking over people’s shoulders, metaphorically or literally.

Want to see how easy a query like this would be to write?  Why don’t I show you…

That’s it. I write that query, set the build to run NDepend’s static analysis, and fail if there are warnings. No more sending out emails, pleading, nagging, threatening, wheedling, coaxing, or bottleneck code reviewing. And, most important of all, no more doing all of that and having problems anyway. One simple little piece of code, and you can totally automate preventing badness. And best of all, the developers get quick feedback and learn on their own.

As I’ve said, code is data at its core.  This is especially true if you’re an architect, responsible for the long term health of the code base.  You need to be able to assess characteristics and properties of that code, make decisions about it, and set precedent.  To accomplish this, you need powerful tooling for querying your code, and NDepend, with its CQLinq, provides exactly that.

By

Introduction to Static Analysis (A Teaser for NDepend)

Rather than the traditional lecture approach of providing an official definition and then discussing the subject in more detail, I’m going to show you what static analysis is and then define it. Take a look at the following code and think for a second about what you see. What’s going to happen when we run this code?

Well, let’s take a look:

Exception

I bet you saw this coming. In a program that does nothing but set x to 1, and then throw an exception if x is 1, it isn’t hard to figure out that the result of running it will be an unhandled exception. What you just did there was static analysis.

Static analysis comes in many shapes and sizes. When you simply inspect your code and reason about what it will do, you are performing static analysis. When you submit your code to a peer to have her review, she does the same thing. Like you and your peer, compilers perform static analysis, though automated analysis instead of manual. They check the code for syntax errors or linking errors that would guarantee failures, and they will also provide warnings about potential problems such as unreachable code or assignment instead of evaluation. Products also exist that will check your source code for certain characteristics and stylistic guideline conformance rather than worrying about what happens at runtime and, in managed languages, products exist that will analyze your compiled IL or byte code and check for certain characteristics. The common thread here is that all of these examples of static analysis involve analyzing your code without actually executing it.

Analysis vs Reactionary Inspection

People’s interactions with their code tend to gravitate away from analysis. Whether it’s unit tests and TDD, integration tests, or simply running the application to see what happens, programmers tend to run experiments with their code and then to see what happens. This is known as a feedback loop, and programmers use the feedback to guide what they’re going to do next. While obviously some thought is given to what impact changes to the code will have, the natural tendency is to adopt an “I’ll believe it when I see it” mentality.

We tend to ask “what happened?” and we tend to orient our code in such ways as to give ourselves answers to that question. In this code sample, if we want to know what happened, we execute the program and see what prints. This is the opposite of static analysis in that nobody is trying to reason about what will happen ahead of time, but rather the goal is to do it, see what the outcome is, and then react as needed to continue.

Reactionary inspection comes in a variety of forms, such as debugging, examining log files, observing the behavior of a GUI, etc.

Static vs Dynamic Analysis

The conclusions and decisions that arise from the reactionary inspection question of “what happened” are known as dynamic analysis. Dynamic analysis is, more formally, inspection of the behavior of a running system. This means that it is an analysis of characteristics of the program that include things like how much memory it consumes, how reliably it runs, how much data it pulls from the database, and generally whether it correctly satisfies the requirements are not.

Assuming that static analysis of a system is taking place at all, dynamic analysis takes over where static analysis is not sufficient. This includes situations where unpredictable externalities such as user inputs or hardware interrupts are involved. It also involves situations where static analysis is simply not computationally feasible, such as in any system of real complexity.

As a result, the interplay between static analysis and dynamic analysis tends to be that static analysis is a first line of defense designed to catch obvious problems early. Besides that, it also functions as a canary in the mine to detect so-called “code smells.” A code smell is a piece of code that is often, but not necessarily, indicative of a problem. Static analysis can thus be used as an early detection system for obvious or likely problems, and dynamic analysis has to be sufficient for the rest.

Canary

Source Code Parsing vs. Compile-Time Analysis

As I alluded to in the “static analysis in broad terms” section, not all static analysis is created equal. There are types of static analysis that rely on simple inspection of the source code. These include the manual source code analysis techniques such as reasoning about your own code or doing code review activities. They also include tools such as StyleCop that simply parse the source code and make simple assertions about it to provide feedback. For instance, it might read a code file containing the word “class” and see that the next word after it is not capitalized and return a warning that class names should be capitalized.

This stands in contrast to what I’ll call compile time analysis. The difference is that this form of analysis requires an encyclopedic understanding of how the compiler behaves or else the ability to analyze the compiled product. This set of options obviously includes the compiler which will fail on show stopper problems and generate helpful warning information as well. It also includes enhanced rules engines that understand the rules of the compiler and can use this to infer a larger set of warnings and potential problems than those that come out of the box with the compiler. Beyond that is a set of IDE plugins that perform asynchronous compilation and offer realtime feedback about possible problems. Examples of this in the .NET world include Resharper and CodeRush. And finally, there are analysis tools that look at the compiled assembly outputs and give feedback based on them. NDepend is an example of this, though it includes other approaches mentioned here as well.

The important compare-contrast point to understand here is that source analysis is easier to understand conceptually and generally faster while compile-time analysis is more resource intensive and generally more thorough.

The Types of Static Analysis

So far I’ve compared static analysis to dynamic and ex post facto analysis and I’ve compared mechanisms for how static analysis is conducted. Let’s now take a look at some different kinds of static analysis from the perspective of their goals. This list is not necessarily exhaustive, but rather a general categorization of the different types of static analysis with which I’ve worked.

  • Style checking is examining source code to see if it conforms to cosmetic code standards
  • Best Practices checking is examining the code to see if it conforms to commonly accepted coding practices. This might include things like not using goto statements or not having empty catch blocks
  • Contract programming is the enforcement of preconditions, invariants and postconditions
  • Issue/Bug alert is static analysis designed to detect likely mistakes or error conditions
  • Verification is an attempt to prove that the program is behaving according to specifications
  • Fact finding is analysis that lets you retrieve statistical information about your application’s code and architecture

There are many tools out there that provide functionality for one or more of these, but NDepend provides perhaps the most comprehensive support across the board for different static analysis goals of any .NET tool out there. You will thus get to see in-depth examples of many of these, particularly the fact finding and issue alerting types of analysis.

A Quick Overview of Some Example Metrics

Up to this point, I’ve talked a lot in generalities, so let’s look at some actual examples of things that you might learn from static analysis about your code base. The actual questions you could ask and answer are pretty much endless, so this is intended just to give you a sample of what you can know.

  • Is every class and method in the code base in Pascal case?
  • Are there any potential null dereferences of parameters in the code?
  • Are there instances of copy and paste programming?
  • What is the average number of lines of code per class? Per method?
  • How loosely or tightly coupled is the architecture?
  • What classes would be the most risky to change?

Believe it or not, it is quite possible to answer all of these questions without compiling or manually inspecting your code in time consuming fashion. There are plenty of tools out there that can offer answers to some questions like this that you might have, but in my experience, none can answer as many, in as much depth, and with as much customizability as NDepend.

Why Do This?

So all that being said, is this worth doing? Why should you watch the subsequent modules if you aren’t convinced that this is something that’s even worth learning. It’s a valid concern, but I assure you that it is most definitely worth doing.

  • The later you find an issue, typically, the more expensive it is to fix. Catching a mistake seconds after you make it, as with a typo, is as cheap as it gets. Having QA catch it a few weeks after the fact means that you have to remember what was going on, find it in the debugger, and then figure out how to fix it, which means more time and cost. Fixing an issue that’s blowing up in production costs time and effort, but also business and reputation. So anything that exposes issues earlier saves the business money, and static analysis is all about helping you find issues, or at least potential issues, as early as possible.
  • But beyond just allowing you to catch mistakes earlier, static analysis actually reduces the number of mistakes that happen in the first place. The reason for this is that static analysis helps developers discover mistakes right after making them, which reinforces cause and effect a lot better. The end result? They learn faster not to make the mistakes they’d been making, causing fewer errors overall.
  • Another important benefit is that maintenance of code becomes easier. By alerting you to the presence of “code smells,” static analysis tools are giving you feedback as to which areas of your code are difficult to maintain, brittle, and generally problematic. With this information laid bare and easily accessible, developers naturally learn to avoid writing code that is hard to maintain.
  • Exploratory static analysis turns out to be a pretty good way to learn about a code base as well. Instead of the typical approach of opening the code base in an IDE and poking around or stepping through it, developers can approach the code base instead by saying “show me the most heavily used classes and which classes use them.” Some tools also provide visual representations of the flow of an application and its dependencies, further reducing the learning curve developers face with a large code base.
  • And a final and important benefit is that static analysis improves developers’ skills and makes them better at their craft. Developers don’t just learn to avoid mistakes, as I mentioned in the mistake reduction bullet point, but they also learn which coding practices are generally considered good ideas by the industry at large and which practices are not. The compiler will tell you that things are illegal and warn you that others are probably errors, but static analysis tools often answer the question “is this a good idea.” Over time, developers start to understand subtle nuances of software engineering.

There are a couple of criticisms of static analysis. The main ones are that the tools can be expensive and that they can create a lot of “noise” or “false positives.” The former is a problem for obvious reasons and the latter can have the effect of counteracting the time savings by forcing developers to weed through non-issues in order to find real ones. However, good static analysis tools mitigate the false positives in various ways, an important one being to allow the shutting off of warnings and the customization of what information you receive. NDepend turns out to mitigate both: it is highly customizable and not very expensive.

Reference

The contents of this post were mostly taken from a Pluralsight course I did on static analysis with NDepend. Here is a link to that course. If you’re not a Pluralsight subscriber but are interested in taking a look at the course or at the library in general, send me an email to erik at daedtech and I can give you a 7 day trial subscription.

By

Static Analysis, NDepend, and a Pluralsight Course

I absolutely love statistics. Not statistics as in the school subject — I don’t particularly love that branch of mathematics with its binomial distributions and standard deviations and whatnot. I once remarked to a friend in college that statistics-the-subject seemed like the ‘science’ of taking a guess and then rigorously figuring out how wrong you were. Flippant as that assessment may have been, statistics-the subject has hardly the elegant smoothness of calculus or the relentlessly logical pursuit of discrete math. Not that it isn’t interesting at all — to a math geek like me, it’s all good — but it just isn’t really tops on my list.

But what is fascinating to me is tabulating outcomes and gamification. I love watching various sporting events on television and keep track of odd things. When watching a basketball game, I always the amount of a “run” the teams are on before the announcers think to say something like “Chicago is on a 15-4 run over the last 6:33 this quarter.” I could have told you that. In football, if the quarterback is approaching a fist half passing record, I’m calculating the tally mentally after every play and keeping track. Heck, I regularly watch poker on television not because of the scintillating personalities at the tables but because I just like seeing what cards come out, what hands win, and whether the game is statistically normal or aberrant. This extends all the way back to my childhood when things like my standardized test scores and my class rank were dramatically altered by me learning that someone was keeping score and ranking them.

I’m not sure what it is that drives this personality quirk of mine, but you can imagine what happened some years back when I discovered static analysis and then NDepend. I was hooked. Before I understood what the Henderson Sellers Lack of Cohesion in Methods score was, I knew that I wanted mine to be lower than other people’s. For those of you not familiar, static analysis is a way to examine your code without actually executing it and seeing what happens retroactively. Static analysis, (over) simplified, is an activity that examines your source code and makes educated guesses about how it will behave at runtime and beyond (i.e. maintenance). NDepend is a tool that performs static analysis at a level and with an amount of detail that makes it, in my opinion, the best game in town.

After overcoming an initial pointless gamification impulse, I learned to harness it instead. I read up on every metric under the sun and started to understand what high and low scores correlated with in code bases. In other words, I studied properties of good code bases and bad code bases, as described by these metrics, and started to rely on my own extreme gamification tendencies in order to drive my work toward better code. It wasn’t just a matter of getting in the habit of limiting my methods to the absolute minimum in size or really thinking through the coupling in my code base. I started to learn when optimizing to improve one metric led to a decline in another — I learned lessons about design tradeoffs.

It was this behavior of seeking to prove myself via objective metrics that got me started, but it was the ability to ask and answer lots of questions about my code base that kept me coming back. I think that this is the real difference maker when it comes NDepend, at least for me. I can ask questions, and then I can visualize, chart and track the answer in just about every conceivable way. I have a “Moneyball” approach to code, and NDepend is like my version of the Jonah Hill character in that movie.

Because of my high opinion of this tool and its importance in the lives of developers, I made a Pluralsight course about it. If you have a subscription and have any interest in this subject at all, I invite you to check it out. If you’re not familiar with the subject, I’d say that if your interest in programming breaks toward architecture — if you’re an architect or an aspiring architect — you should also check it out. Static analysis will give you a huge leg up on your competition for architect roles, and my course will provide an introduction for getting started. If you don’t have a Pluralsight subscription, I highly recommend trying one out and/or getting one. This isn’t just a plug for me to sell a course I’ve made, either. I was a Pluralsight subscriber and fan before I ever became an author.

If you get a chance to check it out, I hope you enjoy.

By

Static Analysis: Why You Should Care

I don’t want to go into a ton of detail on this just yet, but in broad terms, my next Pluralsight course covers the subject of static analysis. I get the sense that most people’s reaction to static analysis lies somewhere between “what’s that?” and “oh yeah, we use FX Cop sometimes.” To be sure, it’s not everyone’s reaction, but I’d say the majority falls into this category. And frankly, I think that’s a shame.

To bring things into perspective a bit, what would you do if you wanted to know how many public static methods were in a given namespace or project? I’m guessing that you’d probably hit “ctrl-shift-f” or whatever “find all in files” happens to be in your IDE, and then you’d start counting up the results, excluding spurious matches for public static classes and properties. Maybe you’d find some way to dump the results to Excel and filter or something a little more clever, but it’s still kludgy.

And what if you wanted to answer a question like “how many 20+ line methods are there in my code base?” My guess is that you basically wouldn’t do that at all. Perhaps you have an IDE plugin that offers some static analysis and LOC is a common one, but absent that, you’d probably just take a guess. And what if you wanted to know how many such methods in your code base also took a dependency on three or more framework classes? You’d probably just live with not knowing.

And living with not knowing leads to talking about code in vague generalities where loudness tends to make right. You might describe the whole reporting module as “tricky” or “crappy” or “buggy,” but what do those things really mean, aside from conveying that you more or less don’t trust that code? But what if you could run some qualitative and quantitative analysis on it and say things like “more than 80% of the methods in that module depend on that flaky third party library” or “there are several classes in there that are used by at least 40 other classes, making them extremely risky to change.” Now you have tangible, quantifiable problems for which you can find measurable solutions that can be validated. And that ability is solid gold in a profession often dominated by so-called religious arguments.

CodeReview

Static analysis of the variety that gives you detailed information about your code and warns you about potential problems combines two incredibly useful software development techniques: code review and fast feedback. Code reviews involve peer inspection of code, but it is conceptually possible to get a lot of the benefit of this activity by having the reviewers codify and store common rulesets that they would apply when doing actual reviews: no methods longer than X lines, no more code added to class Y, etc. Done this way, fast feedback becomes possible because the reviewee doesn’t actually need to find time with reviewers but can instead keep running the analysis on the code as he writes it until he gets it right.

There are plenty more benefits that I could list here. I even could talk about how static code analysis is just flat out fascinating (though that’s something of an editorial opinion). But, for my money, it makes the discussion of code quality scientific, and it dramatically speeds up the review/quality feedback loop. I think pretty much any software group could stand to have a bit of that magic dust sprinkled on it.

By

In Search of the Perfect Code Review

Code Reviews

One thing that I tend to contemplate from time to time and have yet to post about is the idea of code reviews and what constitutes a good one. I’ve worked on projects where there was no code review process, a half-hearted review process, an annoying or counter-productive code review process, and a well organized and efficient code review process. It’s really run the gamut. So, I’d like to put pen to paper (finger to keyboard) and flesh out my ideas for what an optimal code review would entail. Before I can do that, however, I think I need to define some terms up front, and then identify some things about code reviews that I view as pitfalls, counter-productive activities or non-useful activities.

What is a Code Review?

According to the wikipedia entry for a code review, there are three basic types: the formal review, pair programming, and the lightweight review. I’ll add a fourth to this for the sake of pure definition, and call that the “automated review”. The automated review would be use of one or more static analysis tools like FX Cop or Style Cop (the wiki article includes someone using tools like this on the code of another developer, but I’m talking strictly automated steps like “you fail the build if you generate Style Cop warnings”). Pair programming is self explanatory for anyone familiar with the concept. Lightweight reviews are more relaxed and informal in the sense that they tend to be asynchronous and probably more of a courtesy. An example of this is where you email someone a source code file and ask for a sanity check.

The formal review is the one with which people are probably most familiar if code review is officially part of the SDLC on the project. This is a review where the developer sits in a room with one or more other people and presents written code. The reviewers go through in detail, looking for potential bugs, mistakes, failures to comply with convention, opportunities for improvement, design issues, etc. In a nutshell, copy-editing of code while the author is present.

What I’ll be discussing here mainly pertains to the formal review.

What I Don’t Like

Here are some things that I think ought to be avoided in a code review process.

1. Failing to have code reviews

This is important in the same way that QC is important. We’re all human and we could all use fresh eyes and sanity checks.

2. Procedural Code Review: The Nitpick

“You made that camel case instead of Pascal case.” “You could be dereferencing null there.” In general, anything that a static analysis tool could tell someone a lot faster and more accurately than a room full of people inspecting code manually. Suresh addresses this idea in a blog post:

I happen to see only “superficial” reviews happening. By superficial, I mean the types where you get review comments like, “You know the documentation for that method doesn’t have the version number”, or “this variable is unused”, etc.

“Superficial” is an excellent description as these things are generally trivial to identify and correct. It is not an effective use of the time of the developer or reviewers anymore than turning off spell check and doing it manually is an effective use of authors’ and editors’ time. There are tools for this — use them!

3. Paranoid Code Review

This is what happens when reviewer(s) go into the review with the notion that the only thing standing between a functioning application and disaster is their keen eye for the mistakes of others. This leads to a drawn out activity in which reviewers try to identify any possible, conceivable mistake that might exist in the code. It’s often easily identified by reviewers scrunching their noses, concentrating heavily, pointing, or tracing code through control flow statements with their fingers on the big screen while the developer twiddles his thumbs.

Again, there’s a tool for this. It’s called the unit test. It’s not flawless and it assumes a decent amount of coverage and competence from the unit tests, but if executed properly, the unit tests will express and prove corner case behavior far better than people on too little sleep staring at a screen and trying to step through the call stack mentally. This mental execution is probably the least reliable possible way of examining code.

4. Pure Gatekeeper Code Review

This is less of an individual property of code review, but more a property of the review process. It’s where you have a person or committee in the department that acts as the Caesar of code, giving thumbs up or thumbs down to anything that anyone submits. Don’t get me wrong — the aim of this makes sense. You want somebody that’s doing a sanity check on the code and someone who more or less has his or her finger on the pulse of everything that’s happening.

The issue that occurs here is a subtle one that results from having the same person or people reviewing all code. Specifically, those people become the gatekeepers and the submitters become less concerned about writing good code and innovating and more principally concerned with figuring out how to please the reviewer(s). Now, if the reviewers are sharp and savvy, this may not be a big deal, though discouraging new ideas and personal growth is always going to have some negative impact. However, if the reviewers are not as sophisticated, this is downright problematic.

This is relatively easily addressed by rotating who performs code reviews or else distinguishing between “suggestion” and “order”. I’ve participated in review processes where both of these mitigating actions were applied, and it’s a help.

5. Tyrant Gatekeeper

In the previous example, I mentioned a hypothetical gatekeeper reviewer or committee with ultimate authority, and this is a subset of that. In this case, the reviewer(s) have ultimate yes/no authority and are very heavy (and possibly even combative or derisive) with the “no” option. Where the previous example might stifle innovation or developer growth, this creates bottlenecks. Not only is it hard to get things approved, but developers will naturally start asking the reviewer(s) what to do at every turn rather than attempting to think for themselves.

In essence, this creates a state of learned helplessness. Developers are more concerned with avoiding negative feedback at the code reviews than learning, doing a good job, or becoming able to make good decisions based on their own experience. As a result, the developers don’t really make any decisions and ask the reviewer(s) what to do at every step, waiting until they have time, if necessary. The review(s) become a bottleneck in the process.

I have not personally witnessed or been subject to this form of code reviews, but I have heard of such a thing and it isn’t difficult to imagine this happening.

6. Discussion Drift

This occurs during a code review when a discussion of the specific implementation gets sidetracked by a more general discussion of the way things ought to be. Perhaps the code is instantiating an object in the constructor, and a reviewer recommends using dependency injection instead. From here, the participants in the review being to discuss how nice it would be if the architecture relied on a IOC framework instead of whatever it is at the moment.

That’s both a valid and an interesting discussion, but it has nothing to do with some developer checking in implementation code within the framework of the existing architecture. Discussions can have merit and still not be germane to the task at hand.

7. Religious Wars

This occurs during a code review in the following context:

Reviewer 1: You didn’t put curly brackets around that one liner following your if statement. Please add them.
Reviewer 2: No, don’t do that. I hate that.
Reviewer 1: No, it’s better. That way if someone changes the code and adds another statement…. etc.
Reviewer 2: Studies have shown….
Review-ee: Uh, guys….

And so on and so forth. Code reviews can easily devolve into this sort of thing over purely or nearly-purely subjective matters. People get very entrenched about their own subjective preferences and feel the need to defend them. We see this from political affiliation to rooting for sports teams. Subjective matters of preference in code are no different. Neither side is likely to convince the other during the scope of the review, but what is quite likely, and probably certain, is that time will be wasted.

If matters like that are part of the coding standards policy on the project, than it’s an open and shut case. If they’re not, they’re better left alone.

So, How Does a Good Code Review Go?

Having explained what I don’t like in a code review, I’ve provided some context for what I do find helpful. I’m going to outline a procedure that is simply an idea. This idea is subject to suggestions for improvement by others, and ongoing refinement from me. Also, this idea is geared toward the gatekeeper scenario. I may make another post on what I perceive to be the most effective method for voluntary(ish), peer-conducted code reviews where the reviewer(s) are not approval gate-keepers.

  1. Developer finishes the pre-defined unit of code and is ready to have it reviewed for promote/commit.
  2. Developer runs static analysis tools (e.g. StyleCop, FXCop, Code Contracts, NDepend, etc) with configured rule-sets, correcting any errors they uncover.
  3. Once no static-check violations are present, developer notifies reviewer(s) of desire for code review.
  4. Reviewers run static analysis tools asynchronously and reject the request if any rules are violated.
  5. Reviewers examine the code for obvious mistakes not caught by static analysis and/or developer unit tests, and write unit tests that expose the deficiency. (Alternatively, they can run something like Pex)
  6. Developer makes reviewer unit tests pass or else convinces reviewer(s) why they don’t need to.
  7. With all unit tests passing, and all parties familiar with the code, a review meeting is setup (meeting can be skipped for smaller/less crucial code deliveries).
  8. Meeting proceeds as follows:
    1. A mediator who is neither developer nor reviewer is asked to attend to keep the meeting focused and on track.
    2. Reviewers point out something praiseworthy about the submitted code (cheesy, perhaps, but important for starting off in the spirit of cooperation)
    3. Reviewers examine code for redundancy (is anything copy/pasted, defined in many places, etc)
    4. Reviewers examine the code for usable API, perhaps by implementing classes in a sandbox, to highlight shortcomings, unintuitive interactions, weird couplings, etc
    5. Reviewers check for architectural consistency — does the class implement a base class that it should, duplicate the function of some other class in the suite, seem to be in the wrong location, etc.
    6. Reviewers perform a dependency analysis — what new dependencies does this introduce? Cyclical, global, temporal, etc. Are these dependencies acceptable?
    7. Reviewers analyze for code smells and anti-patterns.
    8. Reviewers compile a list of suggested changes for the developer.
    9. Meeting adjourned.
  9. Developer makes suggested changes that he agrees with.
  10. Developer appeals changes with which he doesn’t agree. This could involve the reviewer(s) providing “proof”, for example if they say that the developer shouldn’t do something because of X negative consequence, they should demonstrate that consequence somehow. This appeal could be resolved via proof/demonstration or it could go to a third party where the developers and reviewers each state their cases.
  11. Any suggested changes and the results of any appeals are then promoted/committed.

This process naturally addresses (1) and (2) of the things that I don’t like in that you’re having a code review and getting the procedural, easy stuff out of the way offline, prior to meeting. (3) is made more difficult by the fact that the reviewer(s) are given the opportunity to write unit tests that expose the badness about which they might be paranoid. (4) and (5) are addressed by the appeal process and the general concept that changes are suggestions rather than decrees. (6) and (7) are addressed by the mediator who has no skin in the game and will probably have a natural tendency to want to keep things short and sweet.

One drawback I can see to what I’m proposing here is that you could potentially undercut the authority of the reviewer if the person doing the reviews is, say, the most senior or high ranking person. Perhaps people would want that role a bit less if they could be officially second guessed by anyone. However, I think that creates a nice environment where good ideas are valued above all else. If I were in that role myself, I’d welcome the challenge of having to demonstrate/prove ideas that I think are good ones before telling others to adopt them. In the long run, being steered toward that kind of rigor makes you better at your craft. I also think that it might be something of a non-issue, given that people who wind up in these types of leadership and senior roles are often there based on the merit of their work and on successfully “proving” things over and over throughout the course of a career.

By

A Better Metric than Code Coverage

My Chase of Code Coverage

Perhaps it’s because fall is upon us and this is the first year in a while that I haven’t been enrolled in a Master’s of CS program (I graduated in May), I’m feeling a little academic. As I mentioned in my last post, I’ve been plowing through following TDD by the letter, and if nothing else, I’m pleased that my code coverage is more effortlessly at 100%. I try to keep my code coverage around 100% whether or not I do TDD, so the main difference I’ve noticed is that TDD versus retrofitted tests seems to hit my use cases a lot harder, instead of just going through the code at least once.

Now, it’s important to me to get close to or hit that 100% mark, because I know that I’m at least touching everything going into production, meaning that I don’t have anything that would blow up if the stack pointer ever got to it, and I’m saved only by another bug preventing it from executing. But, there is a difference between covering code and exercising it.

More than 100% Code Coverage?

As I was contemplating this last night, I realized that some lines of my TDD code, especially control flow statements, were really getting pounded. There are lines in there that are covered by dozens of tests. So, the first flicker of an idea popped into my head — what if there were two factors at play when contemplating coverage: LOC Covered/Total LOC (i.e. our current code coverage metric) and Covering tests/LOC (I’ll call this coverage density).

High coverage is a breadth-oriented thing, while high density is depth — casting a wide net versus a narrow one deeply. And so, the ultimate solution would be to cast a wide net, deeply (assuming unlimited development time and lack of design constraints).

Are We Just Shifting the Goalposts?

So, Code Density sounded like sort of a heady concept, and I thought I might be onto something until I realized that this suffered the same potential for false positive feedback as code coverage. Specifically, I could achieve an extremely high density by making 50 copies of all of my unit tests. All of my LOC would get hit a lot more but my test suite would be no better (in fact, it’d be worse since it’s now clearly less efficient). So code coverage is weaker as a metric when you cheat by having weak asserts, and density is weaker when you cheat by hitting the same code with identical (or near identical) asserts.

Is there a way to use these two metrics in combination without the potential for cheating? It’s an interesting question and it’s easy enough to see that “higher is better” for both is generally, but not always true, and can be perverted by developers working under some kind of management edict demanding X coverage or, now, Y density.

Stepping Back a Bit

Well, it seems that Density is really no better than Code Coverage, and it’s arguably more obtuse, or at least it has the potential to be more obtuse, so maybe that’s not the route to go. After all, what we’re really after here is how many times a line of code is hit in a different scenario. For instance, hitting the line double result = x/y is only interesting when y is zero. If I hit it 45,000 times and achieve high density, I might as well just hit it once unless I try y at zero.

Now, we have something interesting. This isn’t a control flow statement, so code coverage doesn’t tell the whole story. You can cover that line easily without generating the problematic condition. Density is a slightly (but not much) better metric. We’re really driving after program correctness here, but since that’s a bit of a difficult problem, what we’ll generally settle for is notable, or interesting scenarios.

A Look at Pex

Microsoft Research made a utility called Pex (which I’ve blogged about here). Pex is an automated test generation utility that “finds interesting input-output values of your methods”. What this means, in practice, is that Pex pokes through your code looking for edge cases and anything that might be considered ‘interesting’. Often, this means conditions that causes control flow branching, but it also means things like finding our “y” div by zero exception from earlier.

What Pex does when it finds these interesting paths is it auto-generates unit tests that you can add to your suite. Since it finds hard-to-find edge cases and specializes in branching through your code, it boasts a high degree of coverage. But, what I’d really be interested in seeing is the stats on how many interesting paths your test suite cover versus how many there are or may be (we’d likely need a good approximation as this problem quickly becomes computationally unfeasible to know for certain).

I’m thinking that this has the makings of an excellent metric. Forget code coverage or my erstwhile “Density” metric. At this point, you’re no longer hoping that your metric reflects something good — you’re relatively confident that it must. While this isn’t as good as some kind of formal method that proves your code, you can at least be confident that critical things are being exercised by your test suite – manual, automated or both. And, while you can achieve this to some degree by regularly using Pex, I don’t know that you can quantify it other than to say, “well, I ran Pex a whole bunch of times and it stopped finding new issues, so I think we’re good.” I’d like a real, numerical metric.

Anyway, perhaps that’s in the offing at some point. It’d certainly be nice to see, and I think it would be an advancement in the field of static analysis.

By

Static Analysis — Spell Check for Code

A lot of people have caught onto certain programming trends: some agility in the process generally makes things better, unit testing a code base tends to make it more reliable, etc. One thing that, in my experience, seems to lag behind in popularity is the use of static checking tools. If these are used at all, it’s usually for some reason such as enforcing capitalization schemes of variables or some other such thing that guarantees code that is uniform in appearance.

I think this non-use or under-use of such tools is a shame. I recently gave a presentation on using some tools in C# and Visual Studio 2010 for static analysis, and thought I’d share my experience with some of the tools and the benefits I perceive here. In this development environment, there are six tools that I use, all to varying degrees and for different purposes. They are:

Before I get into that, I’ll offer a little background on the idea of static analysis. The age-old and time-tested way to write code is that you write some code, you compile it, and then you run it. If it does what you expected when you run it, then you declare yourself victorious and move on. If it doesn’t, then you write some more code and repeat.

This is all fine and good until the program starts getting complicated — interacting with users, performing file I/O, making network requests, etc. At that point, you get a lot of scenarios. In fact, you get more scenarios than you could have anticipated in one sitting. You might run it and everything looks fine, but then you hand it to a user who runs it, unplugs the computer, and wonders why his data wasn’t saved.

At some point, you need to be able to reason about how components of your code would behave in various scenarios, even if you might not easily be able to recreate these scenarios. Unit testing is helpful with this, but unit testing is just an automated way of saying, “run the code.” Static analysis automates the process of reasoning about the code without running it. It’s like you looking at the code, but exponentially more efficient and much less likely to make mistakes.

Doing this static analysis is adding an extra step to your development process. Make no mistake about that. It’s like unit testing in that the largest objection is going to be the ‘extra’ time that it takes. But it’s also like unit testing in that it saves you time downstream because it makes defects less likely to come back to bite you later. These two tasks are also complimentary and not stand-ins for one another. Unit testing clarifies and solidifies requirements and forces you to reason about your code. Static analysis lets you know if that clarification and reasoning has caused you to do something that isn’t good.

As I said in the title, it’s like a spell checker for your code. It prevents you from making silly and embarrassing mistakes (and often costly ones). To continue the metaphor, unit testing is more like getting someone bright to read your document. He’ll catch some mistakes and give you important feedback for how to improve the document, but he isn’t a spell checker.

So, that said, I’ll describe briefly each one and why I use and endorse it.

MS Analysis

MS Analysis encapsulates FX Cop for the weightier version of Visual Studio 2010 (Premium and up, I think). It runs a series of checks, such as whether or not parameters are validated by methods, whether or not you’re throwing Exception instead of SomeSpecificException, and whether your classes have excessive coupling. There are probably a few hundred checks in all. When you do a build with this enabled, it populates the error list with violations in the form of warnings.

On the plus side, this integrates seamlessly with Visual Studio since it’s a Microsoft product, and it catches a lot of stuff. On the down side, it can be noisy, and customizing it isn’t particularly straightforward. You can turn rules on and off, but if you want to tweak existing ones or create your own, things get a little more complicated. It also isn’t especially granular. You configure it per project and can’t get more fine grained feedback than that (i.e. per namespace, class, or method).

My general use of it is to run it periodically to see if my code is violating any of the rules that I care about. I usually turn off a lot of rules, and I have a few different rulesets that I plug in and out so that I can do more directed searches.

Style Cop

StyleCop is designed to be run between writing and building. Instead of using the VM/Framework to reflect on your code, it just parses the source code file looking for stylistic concerns (are all of your fields camel cased and documented and are you still using Hungarian notation and, if so, stop) and very basic mistakes (like, do you have an empty method). It’s lightning fast, and it runs on a per-class basis, which is cool.

On the downside, it can be a little annoying and invasive, but the designers are obviously aware of this. I recall reading some kind of caveat stating that the nature of these types of rules tends to be arbitrary and get opinionated developers into shouting matches.

I find it useful for letting me know if I’ve forgotten to comment things, if I’ve left fields as anything other than private, and if I have extra parentheses somewhere. I run Style Cop occasionally, but not as often as others. Swapping between the rule sets is a little annoying.

CodeRush

CodeRush is awesome for a lot of things, and its static analysis is really an ancillary benefit. It maintains an “issues list” for each file and highlights these issues in real time, right in the IDE. A few of them are a little bizarre (suggesting to always use “var” keyword if it is possible), but most of them are actually really helpful and not suggested by the MS Tools or anything else I use. It does occasionally false flag dead code and get a few things wrong, but it’s fairly easy to configure it to ignore issues on a per file, per namespace, per solution basis.

The only real downside here is that CodeRush has a seat licensing cost and that and the other overhead of CodeRush make it a little overkill-ish if you’re just interested in Static Analysis. I fully endorse getting CodeRush in general, however, for all of its features.

Code Contracts

Like CodeRush, this tool is really intended for something else, and it provides static analysis as a side effect. Code Contracts is an academically developed tool that facilitates design by contract. Pre- and post-conditions as well as class invariants can be enforced at build time. Probably because of the nature of doing this, it also just so happens to offer a feature wherein you can have warning squigglies pop up anywhere you might be dereferencing null, violating array bounds, or making invalid arithmetic assumptions.

To me, this is awesome, and I don’t know of other tools that do this. The only downside is that, on larger projects, this takes a long time to run. However, getting an automatic check for null dereferences is worth the wait!

I use it explicitly for the three things I mentioned, though, if I get a chance and enough time, I’d like to explore its design by contract properties as well.

NDepend

NDepend is really kind of an architectural tool. It lets you make assessments about different dependencies in your code, and it provides you with all kinds of neat graphs that report on them. But my favorite feature of NDepend is the static analysis in the form of Code Querying. It exposes SQL-like semantics that let you write queries against your code base, such as “SELECT ALL Methods WHERE CyclomaticComplexity > 25″ (paraphrase). You can tweak these things, write your own, or go with the ones out of the box. They’re all commented, too, in ways that are helpful for understanding and modifying.

There is really no downside to NDepend aside from the fact that it costs some money. But if you have the money to spare, I highly recommend it. I use this all the time for querying my code bases in whatever manner strikes my fancy.

Nitriq

I think that Nitriq and NDepend are probably competitors, but I won’t do any kind of comparison evaluation because I only have the free version of Nitriq. Nitriq has the same kind of querying paradigm as NDepend, except that it uses LINQ semantics instead of SQL. That’s probably a little more C# developer friendly, as I suppose not everyone that writes C# code knows SQL (although it strikes me that all programmers ought to have at least passing familiarity with SQL).

In the free version you can only assess one assembly at a time, though I don’t consider that a weakness. I use Nitriq a lot less than NDepend, but when I do fire it up, the interface is a little less cluttered and it’s perhaps a bit more intuitive. Though, for all I know, the paid version may get complicated.

Conclusion

So, that’s my pitch for static analysis. The tools are out there, and I suspect that they’re only going to become more and more common. If this is new to you, check these tools out and try them! If you’re familiar with static analysis, hopefully there’s something here that’s new and worth investigating.

Acknowledgements | Contact | About | Social Media