If you are interested in these topics, I’ll be presenting at next TECHunplugged conference in Amsterdam on 6/10/16 and in Chicago on 27/10/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!

Before summer vacation I wrote a first article about “Invisible storage”, now I’d like to follow up on storage invisibility and its benefits by mentioning a couple of solutions that in my opinion are going towards that direction. 

Invisible, yet very present

I used the word “invisible” for lack of a better term… but it doesn’t mean we no longer need or have a proper storage infrastructure. It is still there, but the goal is to hide its overwhelming complexity while improving overall efficiency and $/GB (What a goal!!!). Not an easy task, but fortunately there are some interesting solutions that are starting to address the problem. Most of them are quite recent, but they start from similar basic concepts: maximizing infrastructure efficiency, making storage easier to consume and hiding or minimizing complexity.

One size doesn’t fit all

Even today, Storage remains the most complex and expensive part of our datacenter infrastructures. TCA and TCO are still high compared to other infrastructure components, and overall complexity is another sore spot.

Even today, Storage remains the most complex and expensive part of our datacenter infrastructures. TCA and TCO are still high compared to other infrastructure components, and overall complexity is another sore spot.

At the same time IT organizations, especially smaller ones, would like to have more agile teams with generalists doing multiple jobs instead of specialists, which can easily become a bottleneck. Even more so, with the introduction of continuous delivery methodologies like DevOps, automation and speed of reaction of the infrastructure layer has become essential for success. Containers and other recent developments at the compute and network layers are clearly going towards that model but, due to its conservative nature, storage is still behind and is having a hard time catching up!

Storage and compute, blended together

I’m not talking about Hyperconvergence here or, at least not in the way we usually think about it. Don’t get me wrong, HCI is having a tremendous impact on how storage is managed but the logical model remains identical to what we have seen with traditional shared storage. There is nothing new under the sun.

I mentioned it many times in the past (here and example). At that time I was excited by Coho Data and their Lambda-like approach. Now, looks like others are following that example and are implementing similar functionalities in their systems. There is one difference though, most of them are in the secondary/object storage field.

OpenIO is a great example. It has a great solution which blends traditional object storage with what they call Grid For Apps. In practice, also in this case, unused compute resources of the storage cluster are allocated dynamically to run pieces of code in an AWS-lambda like fashion. (I recently wrote a paper about it, available for free download here)

Another example comes from NooBaa, In practice this solution, which also aims at reducing costs while improving infrastructure agility through automation, takes advantage of unused storage resources in your infrastructure, as well as on the cloud, to seamlessly provide object storage resources for the applications. A very innovative concept. And you know what?… They have plans to add a lambda-like service integrated in their storage. Trust me, the video recorded at last TFDx is worth a watch.

Again, even though they never mentioned AWS Lambda, I really like what companies like Cohesity are doing to revisit all the backup and DR processes while re-using ingested data for other tasks. It’s not only about making the backup process more invisible but the repository also becomes a data source for many other applications which need the same data sets to operate (dev, test, quality, analytics and so on).  At the end of the day it is like having multiple views of the same data. Cohesity has taken a step further by allowing the end user to run Big Data analytics jobs, and some other fancy stuff, directly into the storage system… And I don’t think it would take long for them to add event-triggered jobs if customers will ask for it.

Where is the benefit?

Next generation, cloud-native, developers are favoring object storage instead of other systems. It is much easier for them to deal with it while the rest of the IT organization has less to tend to when it comes to security and scalability. Therefore, Storage has to become smarter and more application aware by running portions of code directly into the storage itself. As with AWS Lambda, it is possible to enable the developer to offload event triggered and highly repetitive throughput-demanding tasks directly to the storage platform in a “server-less” fashion. There are several use cases, including face recognition, video transcoding, text/email object indexing and so on, spanning from a mere better usage of resources up to security and auditing needs.

Next generation, cloud-native, developers are favoring object storage instead of other systems.

Closing the circle

With these kinds of features implemented at the storage level, development is simplified, performance can be more consistent and part of the processing is done closer to the data without moving it around, asynchronously while it is being stored. As a consequence, scalability of the application is simplified and becomes more aligned to the quantity of data under management and the size of the storage infrastructure.

With these kinds of features implemented at the storage level, development is simplified, performance can be more consistent and part of the processing is done closer to the data without moving it around.

The list of storage systems that are capable to directly run code and perform basic operations on files and objects is growing. This is not a feature for everyone (yet), but it is an interesting trend to watch… isn’t it?

[Disclaimer: some of the companies mentioned in this article are my clients]