A few weeks ago I had the opportunity to meet up, in short order, with VMTurbo, SolarWinds (at TFD10) and Cirba’s CEO Andrew Hillier. In one way or the other they are all working on providing monitoring tools for public cloud services in addition to what they usually do (infrastructure monitoring, automation and analytics).
Albeit, cloud monitoring could be considered a natural extension for these kinds of products, I don’t think they’re going about it the right way, especially for services like AWS. Don’t get me wrong, chances are it might merely be a maturity problem, and I’m sure that they are responding to their users’ needs first… But you know, they all remind me of the classic story of the hammer and the nail.
Albeit, cloud monitoring could be considered a natural extension for infrastructure monitoring products, I don’t think they’re going about it the right way.
The problem
Tools like AWS CloudWatch simply suck. I’m not an expert, but every time I’ve tried using them I’ve always had problems. Is it only me? Have any of you had a similar experience? I don’t know for sure but, If you think about the public cloud in terms of IaaS, I’m sure you’ll agree with me!
And I think the answer is quite simple… Public cloud is not about IaaS – who really cares about how the single VM performs? Does it make sense to have all this granularity and tools to move VMs around and get all this optimization? Even more so, you can’t move VMs around as you usually do with VMware vMotion (they don’t exist). In this case, the easiest thing to do is to kill it and instantiate a new one (and good luck with it if your application is not designed to cope with it that way!).
VMTurbo or Cirba, for example, do a huge amount of work for you and can continuously optimize your virtual infrastructure when needed. It’s not just monitoring, it’s more like a virtual assistant playing Tetris where your infrastructure is the game field and workloads are the falling pieces. On premises, where you have full control of the infrastructure, this granular activity is possible, very effective, and I can see all the benefits… but when the public cloud is involved, and you are not in control, except for what is inside your VM and nothing else, everything is much, much harder and blurrier. How can you be sure that the physical server (or the network, or the storage) is ok? How can you be sure that your VMs are running on similar hardware? Or close to each other? These are only a couple of many similar questions you should ask yourself before starting to monitor VMs on the cloud.
And this can get even more complicated if the application you want to monitor is built out of VM instances and a DBaaS for example… or leverages different cloud services.
Chaos Monkeys
The best way to solve this problem is to “adopt a Chaos Monkey“…. But there’s only Netflix out there. (and excuse me for the word pun).
In any case, when it’s about the public cloud, I’m sure that the best approach is to implement monitoring directly at the application level.
In any case, when it’s about the public cloud, I’m sure that the best approach is to implement monitoring directly at the application level. Can you really monitor commodity resources in the traditional way? It’s hard and expensive, it’s probably better to consume them and dispose of them for what they really are and intended to be. It’s not easy, especially if you are doing it only for one application. And it becomes as complex as Netlix’s Chaos Monkey if you plan to make it general purpose… tough choices.
Closing the circle
While public cloud monitoring at the IaaS level is very limited and makes sense only for Ops and for a limited set of use cases, I think that these Infrastructure Monitoring products should look much more closely at solutions like the Chaos Monkey. If not today, at least as an evolution of what they are already doing.
For the traditional enterprise, It’s hard to develop and maintain this type of tool (even if it is available as an open source project now) but it could be easy to adopt if it was part of a larger framework provided by the same trustworthy vendor… it could be considered as a hybrid solution capable of covering a wider range of use cases from infrastructure monitoring to application and cloud monitoring. Something along the lines of the Bi-Modal IT concept which is so popular now!
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conference in London on 12/5/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!
Disclaimer: I was invited to TFD10 by Tech Field Day and they paid for travel and accommodation, I have not been compensated for my time and am not obliged to blog. Furthermore, the content is not reviewed, approved or edited by any other person than the Juku team.