I did a good job with this title, didn’t I?! But, actually, I’m not going to re-define software-defined. Not again and not definitely. I’d like to look around and try to make sense of different interpretations and architecture designs that make this claim. In this article, I intentionally left out some solutions because they are not software-defined for me while, for others, I may have just forgotten to mention them or simply am not aware of them. So, please leave a comment if you have something to add.
Where did Software-defined come from?
Well, the term was introduced in the networking space first (thanks to OpenFlow), but I’d say that “software-defined compute” (aka virtualization in this case) came much earlier than that.
the first definition coined on software-defined networking, it was all about the separation of the control plane from the data plane.
In fact, if you look at the first definition coined on software-defined networking, it was all about the separation of the control plane from the data plane. In other words something similar to what happens at the computer layer, where you have a controller (e.g. vCenter) capable of defining and managing all the components and polices that rule infrastructures, and the Hypervisor (e.g. ESXi) which is simply the mere executor on standard x86 hardware.
But then, as always, things changed. On one side we had Markitectures (just marketing taking advantage of the buzzword of the moment) and on the other, different engineering approaches for the same problem. So everything has become more complicated… including for the network guys.
Network is about data transportation and compute is about working on data. Storage is usually about data persistency, consistency, reliability and durability… hence it’s more complicated to separate control and data layers.
…And for storage it’s even more complicated. In fact, (excuse the oversimplification), while network is about data transportation and compute is about working on data, storage is usually about data persistency, consistency, reliability and durability… all characteristics that are in contrast with the practically stateless nature of other infrastructure components, hence it’s more complicated to separate control and data layers.
But let’s talk about software-defined storage
Since we don’t have a single definition of software-defined, let me work with examples to describe different architectures.
One of my favorite categories of SDS is the one that most resembles the original definition from networking… and Primary Data is the most interesting example from my POV. I met them a couple of weeks ago at SFD10 and here is a video that also explains their architecture. PD really looks like Nicira (now VMware NSX) and promises similar benefits but for storage, with an out-of-band controller and pNFS based components available on different OSes which manage all the data movements. The controller also includes a smart policy engine which allows to associate SLAs to single data volumes and helps to automate a vast number of tasks.
Unfortunately, despite PD’s impressive demos, the product is still not GA and, at the moment, it could be of interest only to large customers with several different storage systems with the intention to normalize the infrastructure. In any case, you should keep your eye on one of those solutions, if nothing else, it’s really interesting per se.
Another type of software-defined storage category that I can think of is the one that includes modern scale-out distributed storage systems. The list of products in this category is very long but they all have a set of common characteristics:
– Use of commodity x86 HW
– Scale-out, shared-nothing, design
– Strong API-based management interface
Borrowing again from Storage Field Day 10, Datera, Hedvig and Cloudian are all good examples of this category. Even though their solutions differ in features and scopes (ranging from object storage to container, OpenStack and VMware data stores) the basics are very similar. In fact, the separation between Control and data planes can be found here as well, even if they can coexist on the same hardware.
In this category I found solutions like Datera very compelling. These are new highly specialized solutions for containers and cloud storage, with an impressive policy/provisioning engine. It’s quite visible that data is stored independently from the presentation and management/control layer or, to be more clear, once the data volume is provisioned the data path is always the result of a function of the cluster layout, while the policy applied to the volume isn’t fixed and pre-determined like in a traditional storage solution. Even fault management is made differently than in a traditional system, with the volume that relayouts itself to meet its policy goals again.
Sounds complicated? it took a while for me to to understand. If you want to know more I strongly suggest to watch this demo from SFD10.
A special mention in this category goes to Coho Data. Maybe the only member of a data-path virtualization sub-category, even though other vendors (like Hedvig, with its proxy, for example) are working, with alternative approaches, on solutions to further virtualize data access. Coho has an interesting SDS implementation which leverages SDN at the front-end. Something that could seem complicated but that brings several practical advantages, allowing to virtualize all the data path and transportation layer.
Last, but not least, I’d like to include an open source solution: Ceph. It is another great example of SDS. And it is very similar in fact to what I’ve already described. It has been maturing very quickly lately, thanks to the efforts of Red Hat (and its investments), and the latest release is quite impressive as well as near term roadmap… at the end of the day it’s no wonder it’s on top of the charts when it comes to OpenStack and container storage.
One thing that I think is important for this category of SD storage, is that this model really works only at scale. In smaller configurations, with very few nodes, there is limited freedom for policy management and constraints imposed by the cluster layout making it, in practice, quite impossible to separate the various components. But, again, these kinds of systems were born to manage large amounts of clients (containers or VMs) and huge capacities with the burden of traditional storage management.
These kinds of systems were born to manage large amounts of clients (containers or VMs) and huge capacities with the burden of traditional storage management.
Another interesting category is, of course, Hyperconverged/VSA-based storage. This can be considered a sort of sub-category of what I wrote above. It works in a similar way, but it has a specific purpose (serving VMs) and it’s highly integrated with the hypervisor. Examples of this category are everywhere, starting from Nutanix down to Vmware VSAN.
For many end users, this is the quintessence of SDS. In fact, in this case, the end user is usually a general purpose sysadmin which carves out the data volumes (or VMDKs) directly from a pool of shared resources. It is the system that does all the heavy lifting for them, meeting the characteristics defined by requested data protection or performance policies or SLAs. The primary goal of this type of solution is to make storage transparent by combining efficiency with ease of use.
For many end users, Hyperconverged/VSA-based storage is the quintessence of SDS.
Closing the circle
Have I left any categories out? As already mentioned, If so please leave a comment detailing the category you have in mind and why you think it is SDS and where you see the separation between control and data planes. If you are just thinking about a software that you can install on a server, it’s not enough.
SDS is becoming the new black and the trend is clearly visible everywhere, with end users of all sizes. Traditional storage sales have been falling quite drastically for a while now and keeping an eye on how the market is evolving is a must… especially when you want to improve the TB/Sysadmin ratio because of the growth of your infrastructure!
Keeping an eye on how the market is evolving is a must… especially when you want to improve the TB/Sysadmin ratio because of the growth of your infrastructure!
AT SFD10 we recorded a round table, I don’t know when Stephen is planning to release the video/recording, but in that occasion I wasn’t able to give a clear picture of my thoughts (jetlag and a long full day of meetings didn’t help – couldn’t think or talk straight 🙂 ). In any case, I hope I found the right words to express what I have in mind this time. And,BTW, I know that this is not enough to tie up the SDS definition but this is just a blog post after all… you can’t expect too much.
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conferences in Amsterdam on 6/Oct/16 and Chicago 27/Oct/2016. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized influencers with disruptive technology vendors and end users who manage rich technology environments. Join us!
Disclaimer: I was invited to SFD10 by GestaltIT and they paid for travel and accommodation, I have not been compensated for my time and am not obliged to blog. Furthermore, the content is not reviewed, approved or edited by any other person than the Juku team.
Trackbacks/Pingbacks