Stories about Software


Does Github Enhance the Need for Code Review?

Editorial Note: I originally wrote this post for the SmartBear blog.  You can check out the original here, at their site.  Take a look around while you’re there and check out their products and other posts.

In 1999, a man named Eric S. Raymond published a book called, “The Cathedral and the Bazaar.”  In this book, he introduced a pithy phrase, “given enough eyeballs, all bugs are shallow,” that he named Linus’ Law after Linux creator Linus Torvalds.  Raymond was calling out a dichotomy that existed in the software world of the 1990s, and he was throwing his lot in with the heavy underdog at the time, the bazaar.  That dichotomy still exists today, after a fashion, but Raymond and his bazaar are no longer underdogs.  They are decisive victors, thanks in no small part to a website called Github.  And the only people still duking it out in this battle are those who have yet to look up and realize that it’s over and they have lost.


Cathedrals and Bazaars in the 1990s

Raymond’s cathedral was heavily-planned, jealously-guarded, proprietary software.  In the 1990s, this was virtually synonymous with Microsoft, but certainly included large software companies, relational database vendors, shrink-wrap software makers, and just about anyone doing it for profit.  There was a centrally created architecture and it was executed in top down fashion by all of the developer cogs in the for-profit machine.  The software would ship maybe every year, and in the run up to that time, the comparably few developers with access to the source code would hunt down as many bugs as they could ahead of shipping.  Users would then find the rest, and they’d wait until the next yearly release to see fixes (or, maybe, they’d see a patch after some months).  The name, “cathedral” refers to the irreducible nature of a medieval cathedral — everything is intricately crafted in all or nothing fashion before the public is admitted.

The bazaar, on the other hand, was open source software, represented largely at the time by Linux and Apache.  The source code for these projects was, obviously, free to all to look at and modify over the nascent internet.  Releases there happened frequently and the work was crowd-sourced as much as possible.  When bugs were found following a release, the users could and did hunt them down, fix them, and push the fix back to the main branch of the source code very quickly.  The cycle time between discovery and correction was much, much smaller.  This model was called the bazaar because of the comparably bustling, freewheeling nature of the cooperation; it resembled a loud, spontaneously organized marketplace that was surprisingly effective for regulating commerce.

At that time, common knowledge held that the cathedral model was the standard and the bazaar model the somewhat subversive underdog.  Many would have contended that the cathedral model was suitable for those wanting to make money, while the bazaar model was better for those involved in academia.  But those whose business was software and who believed that they depended upon proprietary source code as intellectual property sang somewhat of a different tune.

Steve Ballmer, in a 2001 interview, said, “Linux is a cancer that attaches itself in an intellectual property sense to everything it touches.”  Cathedral dwellers like Ballmer seemed to perceive open source software not only as untenable, but as an existential threat to the software industry itself.

Cathedrals and Bazaars in the 2010s

If that seems alien to you, I can certainly understand why.  Take Microsoft itself.  It dipped its toe in the water a bit with moves such as creating an open source web framework, but then it allowed the floodgates to open by open sourcing compilers, frameworks, and .NET itself.  In the span of a decade and a half, Microsoft had done the largest imaginable 180 on the subject.

These days, to some degree or another, every major tech company embraces the bazaar model.  Not all such companies have core, open source products, but the products they do have tend to offer rich platforms for expansion and extension by the development community at large.  Plugin architectures are ubiquitous and publicly available source code nearly so.

As for the cathedrals, they still exist.  But, like their medieval counterparts, they exist as principally as historical relics — museum pieces reminiscent of another era.  Juggernaut, non-software companies writing code in a language version 10 years old are the only ones that think their code is worth guarding (or having) these days.  And even these companies tend to hire waves of consultants to help them with “agile transformations,” which are, in part, designed to help them figure out how to slowly, carefully, bring the bazaar inside.

Github As Tide-Turner

Interestingly, a little-known version control system called Git was the version control system of choice for Linux, back when bazaars were few in number.  It was, after all, a brain child of Linus Torvalds and the bazaar.  It was way, way ahead of its time and, in an interesting piece of meta-storytelling, it represented the bazaar of the source control world.  All (note: not actually all, see the comments below) other source control was centrally maintained and had a single, ultimate source of truth.  Git was distributed and its truth was relative and decided in ad-hoc fashion by its participants.

It wasn’t until 2007 that Github was born.  Github, the site, offered free hosting of open source projects.  It was not the first site to do so, with Sourceforge and Codeplex both preceding it and offering the same basic value proposition.  But Github did two interesting things: it outfitted its site with social media functionality and it used Git, a distributed source control system — a bazzar.

The degree to which Github was in the right place at the right time versus the degree to which it revolutionized our concept of software intellectual property is impossible to say.  But there is no debate that Github was at the absolute center of this sea change.  Before Github, open source hosting housed dusty, weird side projects that no one thought could be sold for profit.  Since Github, open source hosting stores your portfolio of work and even your very credibility in the industry.

Code Review, AG (After Github)

Github has created a Cambrian Explosion of code sharing that has worked its way into all corners of the corporate world from hobbyists to enterprise architects.  10 years ago, if you googled in the hopes of finding a helpful code snippet to copy and paste into your project, you might find an instruction manual, a whitepaper, or a post from a respected blogger.  Now, those results would be peppered with hello world code from people that started programming last week.  And that is truly the bazaar realized — a world full of people of all experience levels and backgrounds, collaborating in ad-hoc fashion to solve one another’s problems.

But it has created an entirely new reality.  There is no longer any guarantee that code you’ll find anywhere is fit for purpose.  In a world where coding and contribution of code have been utterly democratized, there can be no reliance on others for quality control or suitability evaluation.  The onus is increasingly on those of us delivering software to make sure that we know what’s coming in and where it’s coming from.  And while intellectual property in software may have completely flipped in the last 20 years, defect prevention has not — code review is still the way to go.  In today’s bazaar world, with Github at its core, code review is more needed than ever.

Sort by:   newest | oldest | most voted
Avdi Grimm
I want to apologize in advance for being Captain Pedantic: I feel compelled to say that when git arrived on the scene, there were already a number of distributed version control systems in use, including Arch, Darcs, Bazaar, and Mercurial. Some had experienced decent uptake. Git was a relative latecomer, and represented a uniquely Linus-y take on the genre. Ubuntu continued to use Bazaar for many years (maybe they still do? I’ve lost touch) and a number of projects continue to use Mercurial. I wanna say Google still digs Mercurial, but I’d need to check that. Certainly Git has become… Read more »
Erik Dietrich
“Captain Pedantic” 🙂 I like that. And, no need to apologize on my account, since a lead-in like that usually means I’m going to learn something interesting. As best I can recall when I wrote this, I was drawing a comparison between git and the scene-dominating tools at the time (CVS, SourceSafe, ClearCase, SVN, etc), so in saying “all other source control was centrally maintained…” I am wrong. (I’ll leave it as-written, though, to preserve the coherence of this exchange, with a note to see the comments). This got me curious, so I just spent a bit of time reading… Read more »
Frank Parker

More complete background on BitKeeper and Linux split – Linux and other open source projects were allowed to use BitKeeper for free (no charge), but when an open source developer reverse-engineered the BitKeeper protocol, the free license was revoked. see http://lwn.net/Articles/130746/

Trivia about BitKeeper in SCM history: https://news.ycombinator.com/item?id=11667494

Erik Dietrich

Interesting stuff — thanks for the links!