A few weeks ago I wrote a blog about object storage and NAS. As a follow-up, this time I would like to go on with my rants and try to explain why I think that Object storage is a good option as NAS backend and why it could be a big leap forward in how you can exploit the real value of your data.
File limits
One of the most important limits of an ordinary filesystem is the filesystem itself. Many filesystems out of there are limited in the number of manageable files into one directory and in the overall number of the files that can be stored into the whole filesystem tree. Actually, most recent FSs are smarter and can manage huge numbers of files but, then, other problems arise.
Also in the most common cases, when you are very far from reaching the limits of the filesystem, a real consolidation of many file servers (and NASs) is hard to achieve and, most of the time, it’s preferable to use Distributed FileSystems or clustering mechanisms to show a consolidated appearance. At the same time, trying to consolidate dispersed file servers (e.g. remote offices) is questionable due to the limits of protocols involved.
The “no limits” storage
Object storage, Amazon’s S3 is the most vivid example, has virtually no (or very high) limits in the number of stored files and has many mechanisms to give you a “single name space” while concepts like multi-tenancy, security, resiliency, data protection and so on are all taken for granted.
A true file consolidation through object storage has big pros and few drawbacks, if any!
Efficiency: Global deduplication, single instance and other data footprint reduction techniques give their maximum advantage when applied to bigger data repositories.
Security: it’s much easier to apply security policies and have the control of accesses on a single big repository instead of dealing with many remote file server/NASs.
Manageability: it is easier to manage one big multi-tenant system instead of many remotely distributed boxes.
Resiliency and availability: With a correctly configured redundant object storage system you can potentially avoid to make backup copies or many other similar activities to protect your data. Policies applied to the objects define the number of copies (remote and local), versioning mechanisms, retention and so on.
And I could go on with other advantages…
A new way to access data (at the user level)
You know, various kind of gateways can act as legacy file servers but, does it make sense anymore?
In some cases thinking about massively deploying a dropbox-like (sync&share) service could be a much better idea!
Take a look at the behavior of people accessing and using file servers today (standard office user):
- users have a very small view of the whole amount of files available in the company.
- users have access to personal and public files. Personal files are accessed in read/write while public files are mostly accessed only for reading;
- most recently created files are the hottest, then these files become colder very soon;
- most users would like to have a personal and transparent backup, or access to older versions, of their files;
- most users, even in big enterprises, still share files via email because “it’s easier”;
- most users send emails to themselves with files (often to personal home accounts) because they want a copy of those files on their mobile devices… and then they store the file in a personal out-of-enterprise-control-Dropbox-like account: in the best case, the enterprise loses control of data, users don’t have the certainty of having the last version of that file;
- users need to access the same files from various PCs and devices;
I could go on and on, should I?
A smart sync&share client could solve all these issues with a minimal effort and access control to the shares/files can be maintained at the Active Directory level.
A new way to access data (at the business level)
A consolidated file repository, even if the end users have a limited visibility of the whole, opens a world of new opportunities.
Think about all your enterprise unstructured data in a single big repository! How many applications are possible on those data?
An example? some object storage platforms have the ability to generate metadata starting from the stored data, then it is easy to index and search huge amounts of data to extract valuable information (or small subsets of data for further elaborations). Smarter than a file server but less impacting (and cheaper) than a DB (or a sophisticated search engine).
And this is only the first step to what could potentially become part of a larger big data strategy.
Bottom line
Do files have a value after their first period of intense activity for your enterprise?
Do you think that the unstructured data that you are storing today will need to be accessed from various devices and protocols but you still don’t know in how it will happen?
Do they contain valuable information if aggregated with tons of other similar files? And do they have value if aggregated with other kind of information?
Can they offer a way to improve your business or create an advantage of some kind?
Are you maintaining and collecting data from various sources, even only for compliance regulations, but you still haven’t figured out if they can become a useful resource someday?
If so, you should take a look at object storage as a viable option to store your unstructured data.