Tape Optimization
Tape Optimization
Tape has been with IT professionals for a very long time, but over the past five years the calls for it's demise have accelerated. So is tape really dead? The answer today is not really. Clearly the technology to augment tape exists; disk to disk backup, disk archiving and electronic vaulting of data all raise questions about tapes value. The problem is that few environments today, especially in larger Data Centers, have all three of these solutions in place, and you would need to have all three in place to have a chance of eliminating tape all together.
Disk answers many of the challenges created by tape, but it was only recently that those solutions began addressing and improving on what tape did well; store data quickly, store data cheaply, impressive scale by simply inserting more blank media and offer portability for shipping data off site. Until the last few years, tape was the primary target for backup, now slowly disk is eating away at that.
Data deduplication has become a mainstream capability and is now a must-have offering. Because of data deduplication, tape’s roles was lifted to long term backup storage or archiving and portability for DR. Disk Archive Technology came to market like those offered by Permabit and Copan. The emerging Cloud Storage solutions, like Nirvanix and Parascale, are starting to have an impact. Portability of data to a large extent has been solved by data deduplication devices only replicating changed blocks within the backup. Even with technologies like data deduplication, storing backup sets on disk for extended periods of time (more than 1 year) is unusual and there is almost always a desire to move that data to tape. The challenge has been optimizing that data movement.
Tape Lives!
What is tapes role? It essentially fills the gap between these solutions. Most Data Centers can not afford to implement deduplicated backup to disk, disk archiving and electronic vaulting all at once. Its not just the hard cost of the disk hardware either. There is a fair amount of process change when implementing any of these disk based strategies. There is also power consumption concerns. While many disk strategies will apply some fancy math to justify their power needs, at the end of the day there is nothing more power efficient than a powered off hard disk or tape media sitting on a shelf.
Tape manufacturers are not going down without a fight, and in fact last quarter tape drive shipments were actually significantly higher. The capacity on each tape continues to expand so it continues to have a cost advantage. With capacities now reaching 1TB per tape media, the cost per GB of tape is very impressive. Tape's index and positioning on modern drives is improving substantially so searches are getting better. The capabilities of modern tape need to be exploited. It certainly needs to be virtualized and the current Virtual Tape Library (VTL) solutions do not do an adequate job of that. An over applied term, VTL solutions are not virtual and most have nothing or little to do with tape or tape libraries. They are essentially disk storage arrays that have been made to act like tape libraries. They are tape libraries evil step-mother, by creating a tape library that is sealed shut and the only way to get data out of it is to electronically move that data to tape. This often involved going back through the backup server and then down to tape. The backup server is not optimized for this type of rapid inbound - outbound data movement.
A New Path
A new strategy is required that will exploit the speed and depth of modern tape drives while at the same time integrating tape. The term for this new breed of solution is Backup Virtualization. It delivers specific integration of disk and tape as well as exploitation of the new capabilities of modern tape. Backup Virtualization approaches the backup challenge from a holistic view. Tape and disk are combined as one entity, with the Backup Virtualization solution managing data movement between the two. The disk can be of any type, even data deduplication appliances can be consolidated as one unit. Also multiple disk strategies can be leveraged, high speed spinning disk for the initial transfer in of data, data deduplicated disk for the medium term data storage and then tape as the final resting point. All, again, managed by the Backup Virtualization Appliance.
Backup Virtualization Appliances can then focus on the tape technology. Tape drives today transfer data at 120 MB/s and that speed is going to double again this year. Supporting speeds like this is a very different world than the 60MB/s average transfer speed that tape could offer just a few years ago. Backup Servers and Backup Applications have not been re-architected to keep up with these changes, and as a result you have a case of diminishing returns with each successive leap in drive technologies because the Backup Applications and the hardware those applications run on have not kept pace with the tape hardware. The common reaction in most Data Centers is often to just throw disk at the problem. Adding disk to the backup process in many cases actually makes the backup process even more complex than it already was because you have introduced another media platform that has to be managed. Jobs have to be created for it, those jobs over time have to be moved to tape and then eventually purged from disk.
Instead of just throwing disk at the problem, Backup Virtualization includes an Appliance designed specifically for the task. It allows the inbound I/O card to talk directly to an outbound I/O card, therefore transfers from the Backup Application to disk and from the disk to tape are incredibly fast with very low latency. A Backup Virtualization Appliance does not suffer the disk to tape transfer issues that a standard backup server typically will. As a result the bottleneck is no longer in the tape system, it is now in the backup server itself. This allows the Backup Application companies to focus on user needs like specific and improved application and OS support while leaving I/O optimization to the Backup Virtualization providers. From a backup application server perspective there are less management issues and higher performance realized because the hard processing is handed off to the Backup Virtualization Appliance. Finally, less physical tape drives are needed because these tape drives can receive burst data at their full rated speed and tape drive connectivity needs are handled by the virtualization layer.
Another benefit from the Backup virtualization appliance is the dynamic allocation and sharing of physical tape drives. Because each backup server will be assigned dedicated virtual libraries and drives to meet their needs, the physical drives are not committed to any specific backup server. Instead they are used by the Backup Virtualization appliance itself. That means when data is needed to be transferred from disk to tape, an available physical tape drive is allocated for that task regardless of where that data originated from. That effectively allows for the sharing of physical tape drives amongst multiple backup servers regardless of what backup applications that they are running. This takes drive sharing to the next level. Up to now you could have drive sharing within multiple backup servers running the same application, but now you can have multiple backup servers running different applications sharing the same physical tape drives based on dynamic need rather than fixed allocation.
Disk has always had an advantage with random access of single files, but now tape is closing that gap too. The challenge is that most software applications are woefully behind in supporting these capabilities and continue to use the older method of scanning down an entire tape to find a data location. This is often called the "read command" and while it does have some forward fast capability there is a faster alternative.
Backup Virtualization solutions, like those from Gresham Storage, go a step further and improve tapes seek performance through by supporting the ability to do execute a "locate command". To do this requires that they create a marker list that stores every file location mark on each file on each tape. By referencing this marker list they can stream directly to a tape position. This can create a significant improvement in tape positioning times; the time wasted while waiting for the tape to find the data required for a restore. This coupled with techniques like bypassing the disk cache during a restore can greatly improve restore performance when the data is no longer resident on disk.
The larger an Enterprise is, the less likely that tape is going anywhere, they simply have too much data to eliminate tape. In the medium size Enterprise, while it may be more desirable to eliminate tape, without committing completely to disk-to-disk backup with deduplication for a cost effective backup target that must then be replicated for disaster recovery, and a disk archive for long term retention of data, it is difficult to eliminate tape. So even in the medium size Data Center, tape lives on and has a valuable role as a cost effective storage platform.
There the perception today that you have to be either all tape or all disk for your backup strategy, yet both disk and tape have significant benefits and attributes that can complement each other. Why not take advantage of both?
Thursday, August 14, 2008