Two weeks ago I attended Storage Field Day 6 and most of the storage companies who participated in the event are working on some sort of analytics tools for their storage arrays. I’m a strong believer in storage analytics and I think we will see a lot of stuff happening in the next year or so. In my opinion (and I’m not alone), alongside QoS and Smart (automated?) management, analytics will soon become the minimum stake for all primary storage plays.
Last September at Next Gen Storage summit and I used a slide you can see here on the right. This slide was part of a much broader talk (here the link to the slide deck) but I want to start from here to introduce, and then explain, my POV.
My new array is better than your old one…
…And they are all commodity. Most vendors would disagree with me but this is a fact. There’s not much differentiation anymore, comparable solutions in terms of technology, capacity, features and performance use, more or less, similar approaches to solving the same problem. This is the consequence of maturity… and the storage array is commodity. RAID is RAID, Snapshots are Snapshots, Thin Provisioning is Thin Provisioning, VAAI is VAAI and so on. If your Once “modern” storage array has a tick in every checkbox and it works… it will be fine (probably 😉 ).
Ok, given that implementations are different, efficiency can be better or worse and some products are better than others but, at the end of the day, all of them do their dirty job and the difference in terms of TCO is not that much. Success for most startups comes from a re-implementaion of existing concepts, thanks to better usage of new technologies, but they haven’t invented anything new. Once again, do they have a better TCO? Yes they do, but it’s not ten fold!
Analytics is a step forward
Some storage vendors have begun collecting data on their systems many years ago. Data points were stored locally (a few indeed) and they were only used to give a more readable and consistent view of what was happening in the system. In time, things have slightly changed though, and new features have been added (chargeback, basic capacity planning, call-home for proactive support, and so on)… cool, useful but not a game changer.
Cloud+Big Data, the quantum leap!
Cloud and Big data make a huge difference, it’s like moving from the capabilities of a spreadsheet to a relational database.
I want to take Nimble Storage as an example here (probably the most mature and advanced tool out there). Nimble collects between 10 and 70 Million data points per day from each single installed system. Yes, up to 70M a day!!!It’s not only about the quantity of data but it’s mostly about the fact that they can simultaneously compare data of thousands of systems and give you a map of where you are compared to the rest of Nimble’s end users!
This opens a world of opportunities for both the end user and the vendor. Nimble, for example, has already worked hard to integrate pro-active support activities, capacity planning, best practices and more to drive up efficiency, anticipate problems and manage the lifecycle of the array at best.
This kind of solution helps Nimble save money (prevention is better than cure) and makes more money at the same time (they have enough data to predict future customer needs). But what’s more is it makes the customer happy for the same reasons!
There is more of course. Collected data can help developers and engineers to improve the products much faster under all of these aspects and find the best solutions for end users before they even ask for them!
The next step
Nimble is very serious about Infosight and they are developing a lot of amazing stuff around it. (some of the features have been demoed during the session, here the videos).
They are also climbing up the stack and they can now get information from the hypervisor too. This means that it’s not only about the storage but also about workloads generated from the VMs. This can help to compare workloads of similar environments and give the user much more information about what is happening to their system! Just think if Infosight told you: “hey! your storage is going nuts (and users too) not because it’s slow, but because Exchange is probably misconfigured!” or “hey! Look at this, in 15 days you will have this problem because this DB has strange behavior… please check!”.
Like having a Siri-like assistant but for the storage. It would be cool, wouldn’t it? (If i were an SMB end user, I would probably buy the storage array only for this feature!)
At the moment we are not there yet, but some cool tools are already in the works.
Looking at the future (from now on, all conjectures)
After getting data form the hypervisor, they could go deeper into the VM… Why not start scanning the content of VMs too?
With both the workload and the content you might have Nimble’s Siri telling you: “Hey! your File Server is full of unused big video files… I suggest you change the storage profile for that VM and save resources” and “can I do it for you?”… I’d love something like that! (and 100% of users would too!)
But you can go even further! For example, Think about asking the following: “Ok Nimble, let me know what happens if I change this configuration to this VM, and show me what I’ll obtain in terms of latency.” It could be a form of predictive-management 🙂
At the same time, by continuously analyzing workloads, available resources and content of the VMs, the system could find patterns and reconfigure itself before the “rush hour” providing some sort of a predictive-QoS feature.
Closing the circle
I’m going too far of course, and this is much more wishful thinking, not something I really saw… So, please take all this post with a grain of salt.
On the other hand, I know the potential is there and if you watch the videos you can easily imagine Nimble’s next steps towards its Analytics tool. (without the voice-driven assistant maybe 😉 )
Nimble has a good product (I’m talking about the array now) and they have been successful because that. But now, If I were a Nimble reseller, the first thing I would show to a potential customer would be Infosight, without thinking twice about it!
Storage Analytics is going to be massive in the storage industry, Infosight is good (and probably the most advanced at the moment) but all the players are working on something similar.
We already have companies like Data Gravity (great idea, but I’m not totally convinced about the implementation) and others like Quaddra Software (in “semi-stealth” mode) for example. They are all working in the “stored-data analytics” space (I don’t even know if there is an official name for it yet!), the market will soon be crowded with solutions at every level… and I wouldn’t be surprised to find Nimble among them!
In fact, if Infosight continues in its development, nothing will stop Nimble from adding support to third party storage systems, file-analytics maybe, and enlarge the scope of the product… only time will tell.
Disclaimer: I was invited to this event by the GestaltIT and they paid for travel and accommodation, I have not been compensated for my time and am not obliged to blog. Furthermore, the content is not reviewed, approved or published by any other person than the Juku team.
Implementation of features DOES actually matter.
It’s fine to say they are all commodities, and in general, I would agree, except that HOW the vendor actually implements this “commodity” feature does actually have real-life, operational impact, and DOES matter.
Case in point, snapshots. Yes, everybody does them, but some products still use legacy Copy on Write snapshots (LSI-based arrays, for example) – these are cumbersome, slow, and have a huge impact on performance if you have too many hanging around.
Some vendors have very large page sizes for internal block mapping, and this causes the snapshots to consume large amounts of space – Dell EqualLogic is a good example of this.
And Violin (I think) provides these features via an out-of-band Falconstore appliance – that might get it a checkmark on your list, but would you really want to trust your data to this kind of kludge ?
So, yes, I agree – but with a caution – not all checkmarks are the same.
Correct, and that’s why I used “(probably ;))” at the end of the sentence.