A couple of weeks ago I published an article about high performance object storage. Reactions have been quite diverse. Some think that object stores can only be huge and slow and then others who think quite the opposite. In fact, they can also be fast and small.
In the last year I’ve had a lot of interesting conversations with end users and vendors concerning this topic. Having just covered the part about “fast object stores”, again I’d like to point out that by fast I mean faster and with better latency than traditional object stores, but not as fast as block storage. This time round I’d like to talk about smaller object stores.
Talking to some of my colleagues (both analysts and bloggers), they say that object storage makes no sense under one Petabyte or so… But my point of view is that they are dead wrong! It all depends on the applications and on the strategy your organization is adopting. Let me work with examples here.
It all depends on the applications and on the strategy your organization is adopting.
It depends on the application
HDS was one of the first in the market to think about object storage as an enabler for cloud and data-driven applications and not just as a more affordable form of storage for cold data. They invested on building an ecosystem which is now very robust and seems quite successful with their customers.
Two pieces of this ecosystem are the remote NAS gateway and Sync&Share (HDI, HCP Anywhere in HDS nomenclature). HDS claims that more than 1500 customers are running HCP now and, IIRC, 400+ PB of on-premises storage under management. Just by doing the simple math (400/1500), this falls in the range of 260TB per user on average… without considering that some of these customers are really huge and use HCP for the traditional archiving/content management use cases…
I’m wondering how big HDS customers would be on average if I were to remove the first 10 installations in terms of capacity from the equation… and also how many of those 10 customers are actually using HCP for enterprise applications like Sync&Share. I would bet that those 10 are more in the xSP field, video content distribution, archiving, big data and so on. But this is merely speculation on my part… and I invite HDS to leave a comment if they want to add more to this.
Other vendors, like Cloudian for example, have a license that starts as low as 10TB! And I personally met some of their (happy) customers in the range of 100/300TB. These end users have embraced object storage for NAS gateways, file distribution, and lately backup. For each new application they add more capacity and more cluster nodes.
Caringo is another good example. They’ve always worked with ISVs and many of their customers are quite small. And now, thanks to FileFly they have a compelling solution for file server consolidation/remotization. This kind of solution is good for small and large customers and they are doing rather well with it. I was having a talk with them a few months ago and they were thinking about bundling the whole solution (Swarm+FileFly) in a package for the smaller customers (starting at around 40/50TB) because they’ve recorded a lot of interest in that range of capacity.
And I’m not saying that these vendors can’t scale or that they don’t have large installations. Large installations are the case histories you can find on their websites, the kind of installation that is much easier to publicize because it demonstrates the potential of your product. Need another example? Small and specialized vendors like Object Matrix have customers that start under the 200TB (many of them actually)… but on their homepage you’ll find one of the biggest!
Nick Pearce, one of the founders of Object Matrix, told me that most of his customers start very small (in the past the average deal was 60TB and lately -because of large disks I suppose- they start at 300TB ), and they grow up from there… his explanation is simple: less risk while taking advantage of scale-out architecture.
A customer of mine started working with Ceph a while ago and they are now implementing it in production on a 3 node cluster… 100TB usable. I’ve spoken to others in the last 6/9 months who are doing the same with clusters in the order of 100/500TB built out of decommissioned servers. Many of them use it just as a third tier storage for log archiving, secondary backup and so on. But it’s ridiculously cheap and reliable for them…
But you don’t have to trust me… So I asked a comment to someone who works with all the object storage (and NAS) vendors: Jeff Denworth, SVP Marketing at CTERA. CTERA, you probably already know that, is a cool vendor that provides some really interesting solutions such as Sync & Share and NAS gateways, among other fancy cloud-based backup solutions. They have 100s of customers (some of them are ISPs with several thousands of end users, but they are doing pretty well in the enterprise market too). When I asked Jeff to express his opinion regarding, he told me:: “I would say, of our customer pool, the large majority of them have under 200TB. But we’re also not the only use case they consider object for… so we become the first use case (gateways, sync and share, etc.) and then the customer immediately starts thinking about new use cases (backup, then DevOps, are most commonly the next to consider.” and even though CTERA has global deduplication and compression functionalities they are still in the range I’m talking about.
And again, I asked the same question to SwiftStack last week and they told me that the first installation for the majority of their customers is in the order of 300TB now. A capacity that grows quickly in time but still, they usually start small…
Did I mention the majority of object storage vendors here? Well, if not it’s because the article can’t be as long as a novel… but I think I gave you enough to think about small object stores, didn’t I?
But there is more
Some startups are working on smaller object storage systems intentionally. They want to build small object storage systems by design! (or better still, small footprint object storage systems)
Minio is working hard on an S3-comaptible object store that can run in a single virtual machine or a container. The product is open source and is thought up for developers. I think about it as the MySQL of object stores. And they are not alone, also Open.IO has a similar approach to building an object storage system that can serve single applications. The right back-end for developers of the cloud-era.
The idea behind this object storage system is that developers are asking for S3-compatible storage to build their applications. The small footprint is necessary to embed it within a container and distribute the application in the easiest possible way. But this also means that the S3 engine is very small and fast (yes, again, fast!), security is simplified and multitenancy is no longer a problem since you have an S3 repository dedicated to your application. For better or worse, the developer takes control of the overall “micro-infrastructure”.
For better or worse, the developer takes control of the overall “micro-infrastructure”.
You might think I’m out of my mind here… but in a few weeks time we’ll also be seeing Scality, an object storage vendor usually mentioned in very-large scale installations, announcing an interesting component that can also fit this use case. But I can’t say more at the moment, I think this piece of information is still under some sort of embargo.
Once again, we are talking about object storage systems which are intended for small data sets and single applications, with the ability to grow if needed.
Closing the circle
Thinking about object storage only for huge multi-petabyte installations is passé. Examples which support this are everywhere and most enterprises are choosing object storage not for its characteristics of durability or scalability but because they want to implement cloud storage systems with applications that take advantage of protocols like S3.
Even though I do agree that for smaller end users public cloud is a good option, for many of them there are good reasons for adopting an on-premises solution as well.
Storage is no longer about saving data safely and efficiently, which is now taken for granted, but it’s all about distributing and sharing it quickly and securely.
Storage is no longer about saving data safely and efficiently, which is now taken for granted, but it’s all about distributing and sharing it quickly and securely. This is a major issue if the organization is widely distributed and is leveraging mobile devices for its business activity. I realise I’m repeating myself here but, from this point of view, object storage can be considered a NAS 2.0.
And last but not least, with more and more developers adopting S3 and Swift protocols, we’ll be seeing a great deal more small (and embedded?) object stores around…
[Disclaimer: Yes, I did some work for many of the vendors mentioned in this post…]
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conference in London on 12/5/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!
Well, if object-based storage software vendors want to “live long and prosper” they will need many more customers in the tens or hundreds of terabytes category than the multi-petabyte category in order to succeed in the market. Everyone got hung up on how big you could scale-out without realizing that selling on the basis of extreme scale limits your potential customer base. Mr. Jerome Lecat from Scality has guesstimated that the world-wide market for multi-petabyte customers was 20K, which is a relatively small number in the scheme of things. The over all market for customers adopting object-based storage on a smaller scale over time is probably in the tens of millions. The trick is customers with smaller storage requirements need object-based storage software that is easy to install and consume. This means no lengthy professional service engagements to set it up and get in running, no manual tweaking or “tuning” the storage, no complicated network configurations, and no CLI interfaces for customers to manage their object storage clusters. Vendors like Cloudian are getting much better at it as you will see with the release of Cloudian 6.0 in early May. I think Mr. Signoretti is correct to look at how object-based storage can be deployed in a broad range of use cases that don’t consume PBs of storage. That said, object-based storage must scale down and as well as up in order to be more widely adopted. Vendors who consider PB+ scale for openers will miss a lot of the market that could use object-based storage, but on a much smaller scale.
It used to be that the dominant use-case for file storage was user-shares. This left a big semantic gap between the dominant client (humans) and the RESTful interface of object storage. But this has changed with M2M/IOT, Big Data, web content apps, and the shift from SMB shares to Dropbox-type synch and share; the Petabytes are now dominated by applications, not humans. However, there was a high adoption cost (including vendor lock-in) of proprietary APIs. Although CDMI was a step forward as a formal standard, it’s complexity and breadth meant a slow uptake, and opened the door for a de-facto standard in S3.
For better or worse, Amazon’s dominance in public cloud storage drove ISV’s to the S3 API. All the enterprise ever cared about was application compatibility. Now that S3 support is commonplace in BURA apps, content apps, and Big Data frameworks like Hadoop and Splunk, adoption costs have dropped and enabled the smaller-scale market for object.
A lot of VC money was washed away in object storage investments over the last decade, but hopefully we’ve turned a corner. I agree with Tom though, there’s lots of room for improvement in agility, elasticity, TCO, and making it easier.
Well, I’ve been following a number of object-based software (OBS) vendors for a few years and I don’t recall any VC money getting “washed away” so far. Several OBS vendors have been acquired in the past two years, including InkTank (Ceph) by Red Hat, Amplidata by HGST/Western Digital, and Cleversafe by IBM. The OBS startups that remain independent seem to be adequately financed, including Caringo, Cloudian, Scality and SwiftStack. Anyway, there is a lot more VC funding sloshing around the flash array and hyper-converged storage startups than OBS startups, which makes it more likely that some of that funding will get washed away.