Looks like object storage is still in the spotlight. yesterday, Chris Mellor published this article and a few days ago I had a very interesting chat with Paul Carpentier (Caringo‘s CTO) about his company, object storage in general and the evolution that it could have in the future.

Around the corner

Chris has mentioned some rumors about potential cloud service offers from eValut and others (I’ve been hearing similar rumours for quite some time actually). For these offers, to be competitive with Amazon Glacier, they would be based on disks instead of tapes giving the end users the ability to retrieve data faster.
The solution of the greater power consumption of disks (tapes are still better than spinning disks in this space) might be partially solved by a next generation of slower and larger hard drives.
If this is the case we will have a new generation of cloud storage services comparable with glacier when it comes to the price, but with S3 for performance… or, perhaps, something in between.

The disk will become the object storage!

But there’s more than that. Paul Carpentier, during our chat went a step further!
He believes that sooner or later we will potentially have huge hard disks (let’s say 10TB) equipped with a small low-power CPU (ARM?), RAM and networking capabilities.
And there is the potential to have a massive object storage cluster with a very fine granularity that can change its behavior according to what you need: throughput, archiving and so on.
The idea is brilliant and here’s why:

  1. Object storage is 100% software, most of the products out there are software installed on standard x86 boxes filled up with near-line SAS disks.
  2. Linux already runs on ARM and full functioning computers are available at a very low price and you can build a cluster with them (here’s an example).
  3. Companies like HP already have the right backend infrastructure for this kind of small servers (take a look at Moonshot servers).

WS-ENG

But why?

A cluster like this could have many advantages if compared to a traditional solution:

  • lowering the overall power consumption: you can have a very responsive cluster by maintaining metadata and indexes and cache on RAM or on a small flash!
  • Loads of CPU for erasure coding, encryption, tiering management, compression and other data efficiency techniques while the disk is spun down!
  • resiliency and availability: when you lose a server you only lose a disk!
  • granularity: each operations could be done on a very small server and/or chunk of data (think about restoring the right number of copies after the failure of 45+ 10TB disks server!)
  • responsiveness: the same cluster can flexibly manage peaks of traffic and cold data (very suitable for a single S3+Glacier service)!

There are also disadvantages of course, like the number of ethernet ports and the number of servers to manage, but it is also true that these nodes are all identical and these problems could be easily solved!

Bottom line

Hitachi Accelerated Flash storageWell this post is more wishful thinking than a viable scenario at the moment, but if the evolution of object storage will go this way there will be plenty of possible applications and it will also become cheaper and cheaper.

Even more, but now I’m entering in the sci-fi field, with this kind of architecture it’s easy to think about an evolution also in the SSD space. Vendors like HDS, for example, already have 1.6TB (and soon 3.2TB) flash modules for their arrays. These modules are equipped with an ARM CPU to optimize performance and manage the endurance of the MLC NAND… think about this module with a standard ethernet port and the right software, isn’t it intriguing? 😉

PS: I’m very glad to have met Paul Carpentier, he has really changed my mind about Caringo, its product and the technology that they are developing. If you are evaluating an enterprise object storage platform you should add it to your list of the products to look in depth.