Introduction to Static Analysis (A Teaser for NDepend)

Rather than the traditional lecture approach of providing an official definition and then discussing the subject in more detail, I’m going to show you what static analysis is and then define it. Take a look at the following code and think for a second about what you see. What’s going to happen when we run this code?

Well, let’s take a look:

Exception

I bet you saw this coming. In a program that does nothing but set x to 1, and then throw an exception if x is 1, it isn’t hard to figure out that the result of running it will be an unhandled exception. What you just did there was static analysis.

Static analysis comes in many shapes and sizes. When you simply inspect your code and reason about what it will do, you are performing static analysis. When you submit your code to a peer to have her review, she does the same thing. Like you and your peer, compilers perform static analysis, though automated analysis instead of manual. They check the code for syntax errors or linking errors that would guarantee failures, and they will also provide warnings about potential problems such as unreachable code or assignment instead of evaluation. Products also exist that will check your source code for certain characteristics and stylistic guideline conformance rather than worrying about what happens at runtime and, in managed languages, products exist that will analyze your compiled IL or byte code and check for certain characteristics. The common thread here is that all of these examples of static analysis involve analyzing your code without actually executing it.

Analysis vs Reactionary Inspection

People’s interactions with their code tend to gravitate away from analysis. Whether it’s unit tests and TDD, integration tests, or simply running the application to see what happens, programmers tend to run experiments with their code and then to see what happens. This is known as a feedback loop, and programmers use the feedback to guide what they’re going to do next. While obviously some thought is given to what impact changes to the code will have, the natural tendency is to adopt an “I’ll believe it when I see it” mentality.

We tend to ask “what happened?” and we tend to orient our code in such ways as to give ourselves answers to that question. In this code sample, if we want to know what happened, we execute the program and see what prints. This is the opposite of static analysis in that nobody is trying to reason about what will happen ahead of time, but rather the goal is to do it, see what the outcome is, and then react as needed to continue.

Reactionary inspection comes in a variety of forms, such as debugging, examining log files, observing the behavior of a GUI, etc.

Static vs Dynamic Analysis

The conclusions and decisions that arise from the reactionary inspection question of “what happened” are known as dynamic analysis. Dynamic analysis is, more formally, inspection of the behavior of a running system. This means that it is an analysis of characteristics of the program that include things like how much memory it consumes, how reliably it runs, how much data it pulls from the database, and generally whether it correctly satisfies the requirements are not.

Assuming that static analysis of a system is taking place at all, dynamic analysis takes over where static analysis is not sufficient. This includes situations where unpredictable externalities such as user inputs or hardware interrupts are involved. It also involves situations where static analysis is simply not computationally feasible, such as in any system of real complexity.

As a result, the interplay between static analysis and dynamic analysis tends to be that static analysis is a first line of defense designed to catch obvious problems early. Besides that, it also functions as a canary in the mine to detect so-called “code smells.” A code smell is a piece of code that is often, but not necessarily, indicative of a problem. Static analysis can thus be used as an early detection system for obvious or likely problems, and dynamic analysis has to be sufficient for the rest.

Canary

Source Code Parsing vs. Compile-Time Analysis

As I alluded to in the “static analysis in broad terms” section, not all static analysis is created equal. There are types of static analysis that rely on simple inspection of the source code. These include the manual source code analysis techniques such as reasoning about your own code or doing code review activities. They also include tools such as StyleCop that simply parse the source code and make simple assertions about it to provide feedback. For instance, it might read a code file containing the word “class” and see that the next word after it is not capitalized and return a warning that class names should be capitalized.

This stands in contrast to what I’ll call compile time analysis. The difference is that this form of analysis requires an encyclopedic understanding of how the compiler behaves or else the ability to analyze the compiled product. This set of options obviously includes the compiler which will fail on show stopper problems and generate helpful warning information as well. It also includes enhanced rules engines that understand the rules of the compiler and can use this to infer a larger set of warnings and potential problems than those that come out of the box with the compiler. Beyond that is a set of IDE plugins that perform asynchronous compilation and offer realtime feedback about possible problems. Examples of this in the .NET world include Resharper and CodeRush. And finally, there are analysis tools that look at the compiled assembly outputs and give feedback based on them. NDepend is an example of this, though it includes other approaches mentioned here as well.

The important compare-contrast point to understand here is that source analysis is easier to understand conceptually and generally faster while compile-time analysis is more resource intensive and generally more thorough.

The Types of Static Analysis

So far I’ve compared static analysis to dynamic and ex post facto analysis and I’ve compared mechanisms for how static analysis is conducted. Let’s now take a look at some different kinds of static analysis from the perspective of their goals. This list is not necessarily exhaustive, but rather a general categorization of the different types of static analysis with which I’ve worked.

  • Style checking is examining source code to see if it conforms to cosmetic code standards
  • Best Practices checking is examining the code to see if it conforms to commonly accepted coding practices. This might include things like not using goto statements or not having empty catch blocks
  • Contract programming is the enforcement of preconditions, invariants and postconditions
  • Issue/Bug alert is static analysis designed to detect likely mistakes or error conditions
  • Verification is an attempt to prove that the program is behaving according to specifications
  • Fact finding is analysis that lets you retrieve statistical information about your application’s code and architecture

There are many tools out there that provide functionality for one or more of these, but NDepend provides perhaps the most comprehensive support across the board for different static analysis goals of any .NET tool out there. You will thus get to see in-depth examples of many of these, particularly the fact finding and issue alerting types of analysis.

A Quick Overview of Some Example Metrics

Up to this point, I’ve talked a lot in generalities, so let’s look at some actual examples of things that you might learn from static analysis about your code base. The actual questions you could ask and answer are pretty much endless, so this is intended just to give you a sample of what you can know.

  • Is every class and method in the code base in Pascal case?
  • Are there any potential null dereferences of parameters in the code?
  • Are there instances of copy and paste programming?
  • What is the average number of lines of code per class? Per method?
  • How loosely or tightly coupled is the architecture?
  • What classes would be the most risky to change?

Believe it or not, it is quite possible to answer all of these questions without compiling or manually inspecting your code in time consuming fashion. There are plenty of tools out there that can offer answers to some questions like this that you might have, but in my experience, none can answer as many, in as much depth, and with as much customizability as NDepend.

Why Do This?

So all that being said, is this worth doing? Why should you watch the subsequent modules if you aren’t convinced that this is something that’s even worth learning. It’s a valid concern, but I assure you that it is most definitely worth doing.

  • The later you find an issue, typically, the more expensive it is to fix. Catching a mistake seconds after you make it, as with a typo, is as cheap as it gets. Having QA catch it a few weeks after the fact means that you have to remember what was going on, find it in the debugger, and then figure out how to fix it, which means more time and cost. Fixing an issue that’s blowing up in production costs time and effort, but also business and reputation. So anything that exposes issues earlier saves the business money, and static analysis is all about helping you find issues, or at least potential issues, as early as possible.
  • But beyond just allowing you to catch mistakes earlier, static analysis actually reduces the number of mistakes that happen in the first place. The reason for this is that static analysis helps developers discover mistakes right after making them, which reinforces cause and effect a lot better. The end result? They learn faster not to make the mistakes they’d been making, causing fewer errors overall.
  • Another important benefit is that maintenance of code becomes easier. By alerting you to the presence of “code smells,” static analysis tools are giving you feedback as to which areas of your code are difficult to maintain, brittle, and generally problematic. With this information laid bare and easily accessible, developers naturally learn to avoid writing code that is hard to maintain.
  • Exploratory static analysis turns out to be a pretty good way to learn about a code base as well. Instead of the typical approach of opening the code base in an IDE and poking around or stepping through it, developers can approach the code base instead by saying “show me the most heavily used classes and which classes use them.” Some tools also provide visual representations of the flow of an application and its dependencies, further reducing the learning curve developers face with a large code base.
  • And a final and important benefit is that static analysis improves developers’ skills and makes them better at their craft. Developers don’t just learn to avoid mistakes, as I mentioned in the mistake reduction bullet point, but they also learn which coding practices are generally considered good ideas by the industry at large and which practices are not. The compiler will tell you that things are illegal and warn you that others are probably errors, but static analysis tools often answer the question “is this a good idea.” Over time, developers start to understand subtle nuances of software engineering.

There are a couple of criticisms of static analysis. The main ones are that the tools can be expensive and that they can create a lot of “noise” or “false positives.” The former is a problem for obvious reasons and the latter can have the effect of counteracting the time savings by forcing developers to weed through non-issues in order to find real ones. However, good static analysis tools mitigate the false positives in various ways, an important one being to allow the shutting off of warnings and the customization of what information you receive. NDepend turns out to mitigate both: it is highly customizable and not very expensive.

Reference

The contents of this post were mostly taken from a Pluralsight course I did on static analysis with NDepend. Here is a link to that course. If you’re not a Pluralsight subscriber but are interested in taking a look at the course or at the library in general, send me an email to erik at daedtech and I can give you a 7 day trial subscription.