Stories about Software


Why NDepend Uses Google’s Page Rank

Editorial note: I originally wrote this post for the NDepend blog.  You can check out the original here, at their site.  While you’re there, have a look at type rank and all of the other metrics that NDepend will show you about your code.

I remember my early days of blogging as sort of a comedy of errors.  Oh, don’t get me wrong.  I don’t think those early posts were terrible, since I’d always written a lot.  Rather, I knew very little about everything besides the writing.  For example, I initially thought link spammers were just somewhat daft blog commenters.  I stumbled through various mistakes and learned the art of blogging in fits and starts.  This included my discovery of something called page rank.

Page rank had a relatively involved calculation, but that didn’t interest me at the time.  Instead, I found myself dazzled by some gamification.  Sites like this one would take your domain and a captcha as input and spit out a score from 0 to 10 as output.  That simply, they turned my blogging world upside down.  I now had a score to chase and a means of comparing myself against others.  And I vaguely understood that getting more inbound links would increase my page rank score.

Of course, as an introvert, I struggle with outgoing self-promotion.  Cold outreach to people to see if they’d link to me never seriously occurred to me.  Instead, I reasoned that I would play the long game.  Write enough posts, and the shares start to come.  And then when the shares come, so too will the links.  So I watched my page rank inch slowly upward over time.

The Decline of Page Rank

My page rank ticked upward until one day it didn’t anymore.  Turns out, Google slowly killed it over the course of a number of years.  Ten months passed between its penultimate update and its final one.  So there I stood (metaphorically), waiting for a boost to my rank that would never come.

But why did Google kill page rank?  Wouldn’t such an easily digestible construct continue to help people?  Well, sort of.  Unfortunately, it disproportionately helped the wrong sort of people.

The Google founders developed the concept during their time at Stanford.  Conceptually, the page rank algorithm regards a link from site A to site B as a “vote” for site B, by site A.  But not all pages get to “vote” equally.  The higher a rank the page has, the more worthwhile its vote, creating a conceptual feedback loop.

On the surface, this sounds great, and, in many ways, it was.  As you can imagine, a site with a ton of inbound links, like a government study or a news outlet, would accumulate a great deal of rank.  Since employees would carefully curate such sites, you could put a lot of stock in a site to which they linked (and search engines did).  So in theory, you have a democratized system in which the sites best regarded by the public had the best rank.

But in this theory, no link spammers existed.  If you wanted good page rank, you could produce high quality, popular content.  Or you could pay some shady outfit to carpet bomb blog comment sections with links to your site.  Because of this fatal flaw, page rank eventually dwindled to obscurity.

A Useful Reappropriation of Page Rank

For clarity, understand that Google (probably) still uses some incarnation of this scheme.  But they no longer update the easily consumed public version of it.  They now use it as only one of many factors in what they display in response to searches.  The heyday of comparing page rank scores for sites has come and gone.  But that doesn’t mean we can’t use it elsewhere, and to great efficacy.

For instance, consider applying this to codebases.  Instead of a situation where website A links to website B, imagine a situation where type A refers directly to type B.  Now, imagine your codebase as a (hopefully acyclic) directed graph with edges and nodes.  You start to have an interesting vehicle for reasoning about your codebase.

What would a high rank mean in this context?  Well, relatively high rank for a type would mean that other types tended to refer to it at a high rate.  Types with relatively low (or zero) rank would take no dependencies, existing at the edge of your code.  And the types with the highest rank?  These would be types used by other types with high rank.

Code in Terms of Risk

All right, so that’s an interesting exercise.  In performing it, you might give elements of your codebase popularity scores, if you will.  But what substance do you get?

Think in terms of risk.  High rank — popular — types receive a lot of trust from the rest of the codebase, just like a study published on a government or university website.  Other types depend on them, both directly and indirectly.  So when you make changes to them, you incur significant risk.  At best, you might create compiling and building issues.  At worst, you might introduce subtle bugs with broad-reaching impact.  On the flip side, when you change low rank types, you incur relatively little risk.  Nobody depends on them, so if your immediate changes check out, you can have confidence in the code.

In a sense, then, meaning of high rank inverts for codebases as compared to your website.  For the internet, high rank implies high credibility, stability, and popularity.  In your codebase, it implies popularity as well, but popularity means dependency and risk.

Taking Action for Type (and Method) Rank

So does this mean high rank is bad?  Well, no, not necessarily.  It could indicate an architecture that consolidates too much, but you will naturally have more inbound dependencies for some types than for others.  You can’t avoid that.

But high rank does unambiguously mean risk.  So the first thing you need to do is make yourself aware of it.  NDepend will show you your highest ranked types.  Avail yourself of this knowledge and keep an eye on them, particularly if some of them really start to accelerate upward as the project progresses.  (The same logic also applies to ranking dependencies among methods.)

Once you have a sense of these types in your codebase, you can form a plan of attack.  Watch for changes to these types, and when they happen, ensure you have plenty of testing in place to mitigate risk.  If you observe frequent changes to these types, understand that you have a high risk architecture.  Try to work toward a state where you seldom change your highest ranked types.  Conforming to the open/closed principle can help here.

To generalize, make sure you understand the risk and have a mitigation strategy in place.  I stumbled through blogging for a number of years before figuring out the importance of rank and how I can use it.  But I had really low stakes compared to someone responsible for a production codebase.  Make sure you understand how you can use the page rank algorithm to your benefit in your codebase.

Add a Comment

Be the First to Comment!

Notify of