Next year the seventh generation LTO tapes will be hitting the market. We are talking about 15TB cartridges.
It sounds like an interesting media and it actually is if you need to store huge amounts of cold data.
Cost/GB is still the best on the market for cold data but I think there are other problems to be considered.
Tape Throughput…
15TB (compressed) is a lot of data, even for today’s standards. It means that it stores 2.5 times the data of an LTO-6 (6.25TB) in a similar space. I’m not well informed on the physical characteristics of the magnetic tape but even if it’s thinner and longer than it’s predecessor, it won’t be much different… And, the new tapes are only likely to be denser in terms of MB/mm2.
The Tape drives will work in the same way as they always have (they have to maintain a sustained and regular/constant speed to write data), which means that you have to write data faster. In fact LTO-7 tape drive throughput is 750MB/sec.
If you don’t sustain the throughput, the tape drive won’t have enough data to write and would have to stop/pause, fill the data buffer and reposition itself before restarting. Each one of these stops takes several seconds and it will heavily impact performance (or real throughput).
This is a well known problem, already seen and addressed in the past. Disk staging areas, multiplexing and other techniques have always mitigated the problem.
…it’s not about backup
The real problem is that we don’t use these tapes for backup only. Some of them are supposed to be used for archiving too. In this case, traditional optimizations don’t work, especially when data is written and accessed with a very random pattern.
Spectralogic BlackPearl appliance, A Gateway that sits in front of a tape library allowing access to data through S3 object storage API, is an interesting solution to cope with this problem… but it’s also the only one I’ve heard of. And, in any case, I wonder if this approach will be any good once LTO capacity increases to 60 or 120 TB in the next few years.
I’m just thinking about the (big) quantity of objects than can be stored in those cartridges (100+TB could mean many millions of objects)… and even if they were accessed a few times in a long period of time, the number of accesses to the tape could be very important!
Think about photos: an LTO-7 can store 15.000.000MB. A single photo can be 3/5MB. It means between 3 and 5 Million objects. Just on one tape! Now, even if the data is rarely accessed the risk is that a single tape will be hit often with risks of contention, longevity of the media, and so on.
Closing the circle
Tapes have the best $/GB but they are becoming more and more complex to access. Long retention backups and cold archive are the primary use cases but their huge capacity could become a problem because of high concentration of data in a single cartridge.
You know, I’m an object storage fan and maybe this is the best way to manage huge amounts of data if you need to access more than once in its lifetime. I must admit though that I need to dig deeper into this topic before commenting further. And I hope to have some more answers after the Spectralogic summit next month in Denver…
Well, Spectra Logic builds large tape libraries, which can be used in conjunction with their BlackPearl DS3 technology and LTFS “formatted” tapes. BlackPearl DS3 basically extends the S3 API to include a couple of commands related to tape drive operations. If Spectra Logic deploys LTO-7 drives and tapes in their libraries, they can store considerably more objects on tape than they do now. Tape sometimes seems like the “odd man out” compared to the kind of capacity you can get with object-based storage on disk. That said, tape is Google’s last line of defense when it comes to restoring data.