We usually think about primary storage as something that is close to compute resources (no matter if they are on-premises or on the cloud) while cloud-storage is something that we can access, more or less, from everywhere… but things are becoming fuzzy.
Primary as we know it
What do you really look for in a primary storage? Taking availability and resiliency for granted, the first things you look for are high IOPS, Low latency and predictability. Am I right?.
You want it as close as possible to your compute resources. Also when your data is actually stored in the cloud, on an object store for example, to get the performance you need you probably have a huge smart cache in the front-end (Avere Systems is a good example in this case).
For less latency-sensitive data and workloads, usually stored on a secondary storage, you need less predictability and also latency is not a huge issue. In some cases cloud storage is ok… Object storage and all the applications leveraging it, for example, are thought up to manage variable and high latency.
But Cloud comes into the game
The real problem is that infrastructure is no longer in a single location. No matter what cloud service you are using, resources are now spread out between your datacenter(s) and the cloud(s)… your infrastructure is hybrid and problems can arise.
The nice thing is that technology is maturing pretty quickly and now you can move (and convert) VMs and services easily between different private or public infrastructures. Yes, there are still major limits and constraints, but we are getting there. The real problem lies in the fact that you have less control than in the past, especially if you are using public clouds and your services are spread out in different regions/areas. You can mitigate the problem by spending more money (for example by using lease lines?) …but is it a long term and sustainable solution?
The problem of primary storage in the cloud
Service providers like Amazon do not commit in any substantial way to SLAs. They give you no assurance about the storage performance you can carve out from one of their VM/VPS. In fact, it’s not unusual to spin up two identical VMs and find out that they perform differently. It can be good for certain type of workloads, much less in other cases… especially if you have to integrate modern and legacy applications in the same infrastructure.
Predictability is just the tip of the iceberg! Shared storage is less common than you might think, (now Both Amazon AWS and Microsoft Azure make use of it, but it’s still quite limited), data services are limited too, and so on…
Looking for a solution(s)
There are many startups and primary vendors that are trying to match primary storage needs with the unpredictability of the cloud. They all have different solutions, some of which are still immature, others are very interesting but with limited use cases. In any case, we are only at the beginning and I think there will be a lot of development in this area in the near future.
The solution landscape is already large and I’d like to share a few examples just to give you an idea of what is happening:
• Zadara Storage: This is a startup selling Virtual Private Arrays. The solution is simple and clever. You can configure your own array based on a couple of virtual controllers and a variable number of disks and you take full control of it. It’s your array in all its aspects and multi-tenancy is granted by design (since your disks and Flash are accessed only by you). The solution is available from primary service providers (like Amazon) as well as from private cloud deployments. For example, you can manage replication between two different providers or between a provider and your DC.
• ClearSky: they came out of stealth a few months ago. The idea is quite interesting. They offer an appliance that manages part of the cache, while the rest is managed by them on the cloud. To start, the product will be available only in selected metro areas and you will need a specific lease line to use it… but I’m curious to see how it evolves.
• Velostrta: also quite new. Their product allows you to move part of your VMs from your Vmware infrastructure to Amazon AWS while maintaining access to local storage. It leverages caching and the WAN optimization mechanism as well as an in-line conversion mechanism of the VMs during the transfer between the two environments. They were at the last #TFDx and they seemed somewhat immature, (1.0!) but the idea is brilliant and I think they deserve a glance, at least for the potential…
• NetApp: Yes, ONTAP. ONTAP everywhere! But their idea of having ONTAP in the cloud is not that foolish. The company offers to host their filers on the same facilities where Amazon has its infrastructure… meaning that you can transfer data to a Filer that “sits in the cloud” and then access it from your VMs in the cloud. NetApp also offers ONTAP AMIs on Amazon’s marketplace… limited use cases? Yes, but looking at the whole cloud line-up it makes a lot of sense if you are a NetApp customer.
• Avere: they started years ago offering a NAS accelerator but, eventually, they began to leverage object storage instead of NFS in the back-end. Now they can deliver tremendous performance through a scale-out appliance on-site and Object storage in the back-end… and their cluster can also be expanded on the cloud (AWS and GCP at the moment). Use cases are mostly in the HPC, Big Data, media & entertainment… but, again, if you look at the architecture it does make a lot of sense.
• ObjectiveFS: a Cool distributed FS for linux servers that can be deployed across different locations leveraging S3 as back-end. It’s not for every type of workload but you could be surprised by its performance, especially for some specific use cases.
• VMWare: Last August, at Vmworld, they announced the possibility of integrating better Vcloud air with your local infrastructure through a vMotion functionality that should work across private and public infrastructures.
Solutions are plentiful, I know I’ve left out some interesting ones, but I just wanted to give an idea of the solutions that are already available to stretch your infrastructure between public and private infrastructures.
Bottom line
For most IT organizations cloud is a hybrid thing. Most of them don’t have the resources to run a private cloud for everything (and they actually don’t want to!) and sometimes, due to local laws and regulations, you want to keep your data locally even if you are a small organization.
Infrastructure is evolving to address new workload/data mobility demand, we are merely at the start and many solutions are still immature… but there is a great interest from end users and the number of solutions is quickly growing (and maturing).