Logs are all but invisible
When you start collecting logs in your datacenter you quickly realise that they are a lot, and storing them for a long time can be very expensive. The consolidation process can be complex as well as any normalization attempt you make on raw logs. In fact, this is one of those fields where some Big Data tools (like Splunk for example) excel, thanks to their ability to analyze varied types of semi-structured data concurrently. But this implies that you will not only store logs but query and analyze them too.
When you start collecting logs in your datacenter you quickly realise that they are a lot, and storing them for a long time can be very expensive.
Tapes are too slow for both retrieval and analytics, but you need the cheapest storage system without sacrificing data durability, availability and resiliency.
Datacenter logs? Only the beginning
Many organizations are beginning to collect logs from remote offices and all kinds of devices under their control for the most varied reasons like auditing, security, monitoring, billing and so on. This trend will soon be characterized by an even greater flow of logs and sensors, due to the massive number of machines and devices connected to the internet, which continuously send home information.
Many organizations are beginning to collect logs from remote offices and all kinds of devices under their control for the most varied reasons. This trend will soon be characterized by an even greater flow of logs and sensors, due to the massive number of machines and devices connected to the internet, which continuously send home information.
In this scenario, object storage (which is historically considered suitable only for cheap, durable and cold data archives), is poised to become the perfect foundation for storing logs for both analytics and long term retention. Now, throughput with scalability should not be an issue – S3/Swift APIs are supported by an increasing number of software vendors and NFS/SMB gateways make it very easy to ingest data and remain compatible with legacy environments.
Logs and object storage
For object storage systems, the process of streaming data to and from the compute cluster is seamless, while storing original data sets or results is less costly and more reliable than for any other kind of storage; and even more so now with the flexibility provided by cloud tiering mechanisms implemented by most modern platforms. Object storage is no longer relegated to the second or third tier in storage infrastructures. Some of its characteristics are perfectly suitable to be the backend of modern IT infrastructures which need scalability, performance and availability demanded by developers, end users and cloud applications.
Software Defined Solutions like Red Hat Ceph Storage are clearly going toward this direction:
• More performance: by eliminating intermediate layers like local file systems and including optimizations for All-Flash configurations;
• More security: improved thanks to better encryption options;
• More interfaces: thanks to the maturity of scale-out FS and S3 gateway.
Its open source and software-defined nature is another key to its success. In fact, it’s no coincidence that Ceph is already prevalent in OpenStack clusters and growing in popularity in a large number of object (and block) use cases.
Free download to the rest of this paper is available here. Enjoy the read!
If you are interested in these topics, I’ll be presenting at next TECHunplugged conference in Amsterdam on 6/10/16 and in Chicago on 27/10/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!
[Disclaimer: Red Hat is a client of Juku consulting]