Editorial note: I originally wrote this post for the Monitis blog. You can check out the original here, at their site. While you’re there, have a look at the different sorts of production concerns that you can keep an eye on with their offering, some of which I address in this post.
If you have responsibility for software in production, I bet you’d like to know more about it. I don’t mean that you’d like an extra peek into the bowels of the source code or to understand its philosophical place in the universe. Rather, I bet you’d like to know more about how it behaves in the wild.
After all, from this opaque vantage point comes the overwhelming majority of maddening defects. “But it doesn’t do that in our environment,” you cry. “How can we even begin to track down a user report of, ‘sometimes that button doesn’t work right?'”
To combat this situation we have, since programmer time immemorial, turned to the log file. In that file, we find answers. Except, we find them the way an archaeologist finds answers about ancient civilizations. We assemble cryptic, incomplete fragments and try to use them to deduce what happened long after the fact. Better than nothing, but not great.
Because of the incompleteness and the lag, we seek other solutions. With the rise in sophistication of tooling and the growth of the DevOps movement, we close the timing gap via monitoring. Rather than wait for a user to report an error and asking for a log file, we get out in front of the matter. When something flies off the rails, our monitoring tools quickly alert us, and we begin triage immediately.
Common Monitoring Use Cases
Later in this post, I will get imaginative. In writing this, I intend to expose you to some less common monitoring ideas that you might at least contemplate, if not outright implement. But for now, let’s consider some relative blue chip monitoring scenarios. These will transcend even the basic nature of the application and apply equally well to web, mobile, or desktop apps.
Monitis offers a huge variety of monitoring services, as the name implies. You can get your bearings about the full offering here. This means that if you want to do it, you can probably find an offering of theirs to do it, unless you’re really out there. Then you might want to supplement their offering with some customized functionality for your own situation.
But let’s say you’d just signed up for the service and wanted to test drive it. I can think of nothing simpler than “is this thing on?” Wherever it runs, you’d love some information about whether it runs when it should. On top of that, you’d probably also like to know whether it dies unexpectedly and ignobly. When your app crashes embarrassingly, you want to know about it.
Once you’ve buttoned up the real basics, you might start to monitor for somewhat more nuanced situations. Does your code gobble up too many hardware resources, causing poor experience or added expense? Does it interact with services or databases that fail or go offline? In short, does your application wobble into sub-optimal states?
But what if we look beyond those basics? Let’s explore some things you may never have contemplated monitoring about your software.