How to Analyze a Static Analyzer

Editorial Note: I originally wrote this post for the NDepend blog. You can check out the original here, at their site. While you’re there, take a look around at some of the other posts, and sign up for the RSS feed, if you’re so inclined.

First things first. I really wanted to call this post, “who will analyze the analyzer,” because I fancy myself clever. This title would have mirrored the relatively famous Latin question from Satires, “who will guard the guards themselves?” But I suspect that the confusion I’d cause with that title would outweigh any appreciation of my cleverness.

So, without any literary references whatsoever, I’ll talk about static analyzers. More specifically, I’ll talk about how you should analyze them to determine fitness for your purpose.

Before I dive into that, however, let’s do a quick refresher on the definition of static analyzer. This stack overflow question nails it pretty well, right at the beginning of the accepted answer.

Analyzing code without executing it. Generally used to find bugs or ensure conformance to coding guidelines.

Succinctly put, Aaron, and just so. Most of what we do with code tends to be dynamic analysis. Whether through automated tests or manual running of the program, we fire it up and see what happens. Static analyzers, on the other hand, look at the code and use it to make deductions. These include both deductions about runtime behavior and about the codebase itself.

What’s Your Goal?

Why rehash the definition? Well, because I want to underscore the point that you can do many different things with static analyzers. Even if you just think of them as “that thing that complains at me about the Microsoft guidelines,” they cover a whole lot more ground.

As such, your first step in sizing up the field involves setting your own goals. What do you want out of the tool? Some of them focus exclusively on code quality. Others target specific concerns, such as behavioral correctness or security. Still others simply offer so-called “linting.” Some do a mix of many things.

Lay out your goals and expectations. Once you’ve done that, you will find that you’ve narrowed the field considerably. From there, you can proceed with a more apples to apples comparison.

You’re Probably Overrating “Errors Found”

Another thing worth stating up front — you’re probably placing too much importance on errors found. In concept, you have a good way to compare analyzers. Turn them loose on the prospective codebase and see which one tells you more. Over the years, I’ve seen and heard tell of a number of evaluations along these lines.

And this does have some value. But probably not as much as you think, given both the complexity of the prospect of static analysis and the nature of the role it occupies in your development process.

False Positives and Negatives

In the first refinement of the idea of “errors found,” let’s consider the idea of both false positives and negatives. The idea of a tool finding false positives immediately surfaces a problem with the “errors found” metric. Quantity does not equal quality, as it were. If the tool finding the most errors is wasting your time with a bunch of non-problems, you can hardly construe this as an advantage.

But also beware false negatives. You similarly don’t want a tool that misses things.

Generally speaking, you want to feel justified faith in your analyzer. So when comparing them, pay attention to both to the rate at which they reportedly find non-issues and at which they miss actual issues. Consider overall accuracy.

Is This Thing On?

Accuracy of the tool tells you how much immediate benefit you can reap and how much you can trust it. But what about the future? Will the tool continue to serve you well?

To answer these questions, you must take into account how faithfully the tool author issues updates and patches. The world of software moves extremely quickly, and new issues crop up constantly. And, beyond that, your team will internalize feedback and stop making yesterday’s mistakes. You won’t need the same feedback from the tool a year from now that you get today, let alone 5 years from now. Sure, some metrics and code properties stand the test of time. But plenty need revisiting.

You’ll want to take into account how actively the tool authors update the tool. Do you find yourself with a choice between a continuously updated tool and an abandoned project? All else being equal, this is no choice at all.

Integration with Your Tooling

So let’s assume you’re choosing between two relatively accurate, actively-updated static analyzers. How else might you distinguish?

I would encourage you not to underrate the importance of how they integrate with your existing development tools. Does one of them plug into your IDE? Does your build/CI tooling have a hook for it? Can team members access and run it easily in general? Can they easily read and act on its output?

All of this matters a great deal. When a tool integrates without friction into your existing process, developers will capitalize on it by using it. But if they have to go to great lengths to get it to work, they will ignore it. You’ll have to poke, prod, and nag them to get them to use the tool, making it a relatively poor investment.

Cost

And, speaking of investment, you should definitely weigh the relative costs of the tool. How much bang do you get for your buck?

But you don’t need me to tell you that the cost of tools matters when you’re weighing them against one another. You’re going to do that anyway. But, have you factored in the less obvious costs also at play? For instance, let’s say that you pay $200 for a tool because the alternative costs $1,000. That seems like a great decision. But if your expensive software developers have to spend an extra 20 hours each setting up that $200 tool, the cost of that tool suddenly balloons to thousands of dollars.

Not all costs come from the sticker price of the tool itself. Make sure you understand what it costs to buy, install, setup, and maintain the tooling, and take the entire picture into account.

Word of Mouth

With all of the other factors considered, I’ll close by suggesting that you heavily consider word of mouth. What tools have the members of your team previously used? And, of those, which do they swear by?

And consider also the company and people that make the tools. Does the community respect them and value their contributions? Does their word carry a lot of weight? Do people trust them to offer good advice and wisdom on the subject matter covered by their tools?

Static code analysis does a wonderful thing by putting some actual data to typically subjective concerns like correctness and code quality. But even such a data-driven pursuit is subject to consumer confidence and trust. When analyzing the analyzers, run them through all of the paces and analysis, but don’t lose sight of their credibility within the community.

How to Analyze a Static Analyzer

What’s Your Goal?

You’re Probably Overrating “Errors Found”

False Positives and Negatives

Is This Thing On?

Integration with Your Tooling

Cost

Word of Mouth

Addicted to Unit Testing

Static Analysis — Spell Check for Code

Inverting Control

Testable Code is Better Code

Adventures in Pure Test-Driven Development

A Better Metric than Code Coverage

What’s Your Goal?

You’re Probably Overrating “Errors Found”

False Positives and Negatives

Is This Thing On?

Integration with Your Tooling

Cost

Word of Mouth

Similar Posts