DaedTech

Stories about Software

By

A Look at 5 NoSQL Solutions

Editorial Note: I originally wrote this post for the Monitis blog.  You can check out the original here, at their site.  While you’re there, have a look at variety of monitoring solutions they offer to keep your production code running smoothly.

If you’ve never seen it, you should take a look at the Gartner “Hype Cycle.”  Technology changes the world quickly.  And the technology world itself changes even more quickly.  This can lead to breathless buzzword saturation, followed by the so-called “trough of disillusionment” when the tech predictably fails to cure world hunger.

Wired magazine put “Big Data” in the trough of disillusionment in 2013.  So that placed it at the peak of inflated expectations around 2011.  And, as a matter of fact, I remember that time quite well.

Everyone suddenly became interesting in something called “NoSQL.”  What was it?  What problems did it solve?  It didn’t matter — you just needed to start doing it.  And, given the year, you should start doing it in a way that somehow involved a mobile app.

What Is NoSQL?

Neither the big data movement nor NoSQL technologies have gone anywhere.  They’ve woven themselves seamlessly into the fabric of the tech world.  Only the hype has abated.

But that doesn’t mean everyone obtained perfect clarity about what this involved.  When a concept hits the peak of the hype wave, definitions get fuzzy and folks get confused.  If you don’t believe me, try to obtain universal consensus about what “agile” means.  NoSQL suffered the same fate.

You can read about some of those misconceptions here, but I’d like to focus on the one I recall as most prominent from my experience.  Specifically, many people and particularly relational database traditionalists, seemed to believe “NoSQL” was some kind of off-brand comptetitor to traditional databases.  That is, you could go with Oracle, SQL Server, or NoSQL.

I highlight that misconception to underscore the counterpoint.  I’ve seen a few different suggested meanings of the acronym, but I favor “Not Only SQL.”  Simply put, the NoSQL movement represented the idea that there were other options for storing data besides relational databases.

Read More

By

Go On, Live a Little. Denormalize Your Data

I have years of professional experience with and completed several academic courses about relational databases. This is a pretty familiar topic for me, and I’ve worked with a bunch of them: SQL Server, SQLLite, MySQL, PostgreSQL, Oracle, and even MS Access. I’ve always found them, paradoxically, to be both comfortable and awkward. They’re comfortable because of my familiarity with them and awkward because I’ve used them almost exclusively with OO languages and incurred the impedance mismatch.

While RDBMS are in my comfort zone, NoSQL alternatives are comparably alien but also intriguing. Ayende has made some pretty convincing arguments as to why one should think outside the RDBMS, including a .NET Rocks episode in which he talks about constraints, such as expensive disk space, that applied decades ago when RDBMS were designed and are no longer of concern. I’m receptive, my interest is piqued, and I’ve played a bit with MongoDB To understand how these things work. But I think it really clicked for me the other day when I was staring at a schema that had some definite complexity issues. I think I finally groked the use case for document databases.

I was working with a pre-existing schema recently in which (mildly obfuscated) you have a “customer” who can be part of one or more “programs.” The entities “customer” and “program” each have their own properties and, as any diligent normalizer will tell you, that information should not be repeated. So they get their own tables. Since a customer can participate in multiple programs and multiple customers can be in the same program, this is an “M to N” relationship — represented, predictably, by a linking table. What’s more, customers can participate repeatedly in programs so the linking table will sometimes have multiple participation records per customer-program pair, differing by a participation timestamp. I’m sure those of us who have done any significant database development have played out similar scenarios countless times.

As I wrangled Entity Framework to deal with this and other similar relationships that were making it relatively complicated to pull certain information, I started thinking that this seemed harder than it needed to be. There was no use case for handling “what if the program name or another attribute changes in the middle of its offering,” and there really are no good answers to that question. Right now, the implementation is “you can’t do that,” but this is hardly satisfying. To administrative users, it seems arbitrary and like a system shortcoming. But the alternative is unpleasant, too — additional complexity to allow “partial programs” in the schema or the code.

I started thinking about normalization and whether it really mattered here. Allowing duplication is a cardinal RDBMS sin, but it sure would make life easier for a lot of use cases. I mean, imagine a scheme where there was just a table of customers, and participation was recorded as you went, with the program information being duplicated in each participation record. The downside is the information duplication, but the upside is that changing programs midstream is trivial and there is less table overhead and complexity to maintain. No M to N relationships and worrying about whether a program is current or replaced by a new one.

And that’s when it hit me. RDBMS are great for maintaining, well, relational information. For instance, if I work at a company, I have a boss, and my boss has many reports. Modeling the manager-line employee relationship is a great fit for a relational database because it’s about modeling relationships, like “manager to reports.” Tree structures of composition tend to be a good fit as well, such as assemblies like cars and other pieces of machinery. Parts consist of sub-parts and so on. These are relationships in the sense that operations like “delete” and “replace” make sense.

But what about the one-and-done concept of a customer participating in a program one day? That isn’t a relationship any more than I form a relationship with a waiter that I tip and then never see again. That’s something that happened once — not an ongoing arrangement. And that’s where things really snapped into focus for me. RDBMS and the normalized model is great for managing relationships but is not necessarily the be-all-end-all for managing extremely transactional data in the sense of event recording.

And while I’m probably not in a position for this project to quickly introduce document databases and stir up the works too much, I am considering modeling what I might do storage-wise after a document database there in SQL Server. Perhaps it’s time to throw caution to the winds and start allowing some information repetition in there for historical modeling. If the experiment fails, I can always normalize it. But if it succeeds, adding document databases to the mix will be a lot easier since I’ll just be learning the syntax of the API as I go rather than the fundamental underlying concepts and use cases.