In the last few months I had several interesting briefings with storage vendors. Now, I need to stop and try to connect the dots, and think about what could come next.
It’s incredible to see how rapidly the storage landscape is evolving and becoming much smarter than in the past. This will change the way we store, use and manage data and, of course, the design of future infrastructures.
I recently put a few ideas together in a couple of posts (here and here), but I think I can try to develop my ideas even further now.
Storage isn’t storage any longer!
Most of us still think about storage as a box full of disks (or Flash). That’s a misconception.
Yes, there are still plenty of solutions based on dumb boxes full of disks, but this is not the point. If you are thinking about complex environments and you want best of breed solutions to achieve maximum efficiency, performance and scalability then you probably need to look at something else…
… And, from my point of view, there are two trends that have to be considered:
1) Distributed-system design
Many of the most recent storage architectures are thought up as distributed systems that can act as a storage system. This is a paradigm shift and, in fact, modern scale-out storage as well as hyper converged infrastructures are based on this concept. In some cases, like in Object Storage platforms for example, distributed DBs like Cassandra are at the core of the product (if you need examples, just take a look at this video about Cloudian recorded at #SFD7 or at Hedvig website). This is the best way to avoid any kind of bottleneck and linearly scale both in performance and capacity by simply adding nodes while, at the same time, granting best availability and resiliency.
This kind of design is not to be considered a must, of course. Not all organizations need massive scalable systems but, especially for secondary storage, the Petabyte is no longer a chimera even for mid sized organizations.
At this point, storing data as-is is just the first step and you can leverage cluster resources to do much more.
2) Awareness
Traditional storage systems don’t have a clue of what is happening around or in them. They just serve files and IOPS while protecting data against hardware failures.
On the contrary, next generation storage systems are smarter and aware of their behavior, the workloads they are serving or about the data they are storing. This is a leap forward and can radically change the role of storage in the infrastructure.
These kinds of systems can actively contribute to lower infrastructure TCO and, sometimes, they can become active components of the infrastructure and application stack.
Examples? Nimble Storage InfoSight (about which I wrote a paper on a few days ago, ;)), is a tool capable of analyzing the behavior of your whole stack, starting from the storage point of view up to networking, hypervisor and VMs. And it helps to understand quickly if something is (or will be) going wrong, by leveraging cloud-based analytics.
Others, like Data Gravity, work on stored-data and can analyze the content and exploit all of its value.
These two characteristics are not concurrently necessary or complementary to each other, but they enable the development of different features that can be deployed locally or from the cloud.
Except for a few notable cases, we are still at the first generation stage of these systems but the potential is massive and I’d like to give you some idea of what will probably happen in the near future.
Smarter storage is just around the corner
One interesting thing I saw last week came from Coho Data (a scale-out storage solution). Coho is developing the ability to run containerized code into their servers triggered by events.
For example, if a new movie file lands in a specific portion of the storage, it could be re-encoded in different formats thanks to Coho APIs and a few lines of code. And this is just an example, but with a similar feature you can build applications that can run into the storage analyzing and managing data very intelligently and in function of your exact needs.
The number of applications is endless:
– A specific backup procedure that sends your snapshots to the cloud?
– Adding new interfaces and protocols to your storage?
– Implementing in-place data analytics? Data streaming analytics?
– Constantly looking at access patterns to find potential data breaches/leaks?
– The only limit is you imagination.
Yes, there are physical limits imposed by CPU usage, but I’m sure that, as it happens for standard hyper-converged infrastructures, vendors will be able to find the right balance or to propose specialized nodes.
In fact, from a certain point of view this is very similar to a hyper-converged infrastructure but with an major difference: no hypervisor and no VMs. A lightweight dockerized approach which can be very powerful while consuming fewer resources! It’s like building your very specialized appliance out of commodity hardware and software.
There are different types of smarter storage
Sometimes you don’t have the CPU power to achieve that but, in this case, Cloud can be of help! If you look at what is happening with cloud-based storage analytics you can easily guess the next step. Let’s take Nimble as an example again, they are already harvesting millions of data points for each single array per day. With the latest implementation of their cloud-based analytics product, they can now monitor the whole stack. How much time will it take to start digging into VMS and looking at what they are really doing in terms of data and applications? Security concerns aside (but I’m talking about metadata), the potential is huge and it is not limited by your internal infrastructure! It could help you to better understand your infrastructure and compare it with others all around the world, You would be able to find weak spots and information about any aspect of your data and infrastructure. For example, it would be possibile to analyze the behavior of your application in a specific context and compare it with what it is usually measured in similar environments. Also in this case, there are many possible applications.
Closing the circle
I love thinking about the potential of these solutions, and the fact they are not very far from where we are today.
Most of the time, application-aware storage is another word for optimization and integration with a specific application. On the other hand, data-aware storage is something totally new and it could have many different connotations. It will be interesting to watch what will happen in the next future.
In fact, many storage vendors are working on new data-aware solutions which will have the potential to change the role of data storage in your infrastructure. Some of these solutions are already available in their first commercial implementation but the vast majority of them are still in the early stages of their development cycle… Interesting times ahead!