A Backup API

 

The technology that backups are stored on today has changed dramatically over the past four years. Frustration with tape has lead to the implementation of disk-based backups. Disk backup started as just buying a cheap RAID array and connecting it to a backup server. Its first step in evolution occurred when Virtual Tape Library solutions came to market. VTL’s proved to be complex and expensive. MAID systems began to appear that could densely pack storage and power down inactive drives. At the same time, data deduplication technology entered the market with IP based systems that simplified the addition of disk into the backup process while at the same time offering the value of eliminating redundant data, for the first time moving disk out of the “cache” category and making it a viable long term storage mechanism. Today 20X storage efficiencies on these deduplication systems are not uncommon. Once IT staffs began to understand what data deduplication was its adoption began. 


Data deduplication then added the ability to address an often asked for customer requirement; the ability to replicate or electronically vault data to a remote site. Prior to data deduplication this was nearly impossible to do, especially over a distance, but by leveraging the data deduplication engine itself, the size of the replicated data set was only the blocks of data that had changed or were new since the prior nights backup. The result was the ability to have an electronically archived copy of the backup within a few hours of the local backup being complete. Adoption grew rapidly. 


These backup targets are a dramatic departure from the tried and true tape drives and tape libraries that dominated the environment. What has remained virtually unchanged is the backup application’s understanding of disk-based technologies, how best to interact with them and take advantage of their capabilities. They essentially treated disk as tape. Disk is not tape. Disk can be accessed randomly as opposed to sequentially and it is not limited to a few finite number of drives to handle inbound data. The lack of specific support for disk as disk has resulted in sometimes-difficult integration to the backup application and almost always having to run essentially two separate processes to take advantage of the technology investment and completely protect the environment.


For example, as stated earlier one of data deduplication’s key enablers is its ability to leverage the technology to perform WAN optimized data replication. This allows a customer to automatically create an electronically vaulted disaster recovery copy of data while utilizing only very modest WAN bandwidth. This is a significant improvement over tape based strategies or replication with a standard VTL and with proper planning many customers are executing the strategy flawlessly. To elevate the solution further is to be able to integrate the solution so that the backup application understands that there is a remote data set and can take advantage of its existence beyond its use as a DR copy.


This would require the suppliers of these sophisticated backup targets to either hack their way in to the backup application or to write scripts to perform some of the tasks. The best solution would be for the backup application suppliers to write an interface or API set into their application that would allow manufacturers of intelligent disk solutions to deliver these capabilities. The first backup supplier to deliver this functionality is Symantec through their NetBackup product. In November 2006 they announced the OpenStorage API (OST). 


The NetBackup OpenStorage API is designed specifically to allow intelligent disk storage devices to integrate natively into NetBackup. With OpenStorage, NetBackup treats disk as disk. By providing high speed access to IP or Fibre Channel disk solutions, the OST API allows for improved sharing of disk resources between heterogeneous NetBackup Media Servers. As a result NetBackup can provide better utilization of disk based backup storage devices. These could be deduplication devices, or MAID solutions . A disk based backup resource that supports OST would allow for example NetBackup to control replication between multiple data centers and simplified operation by unifying the management of the process. NetBackup refers to this process as ‘Optimized Duplication’.


The first hardware manufacturer to support the OpenStorage API is Data Domain. Existing NetBackup customers purchase the Symantec OpenStorage Disk Option license and the Data Domain OpenStorage software option – which consists of an OpenStorage plug-in and OpenStorage server. The NetBackup OpenStorage Disk Option is enabled on the NetBackup server and the Data Domain OpenStorage plug-in is installed on NetBackup media server platforms. 


Once in place, the solution will allow NetBackup Administrators to create an OpenStorage Disk Storage unit that will contain the deduplication system, presented to NetBackup as a disk pool. They can backup, restore, and duplicate backup images written to the OpenStorage disk storage units directly from the NetBackup control console in an automated fashion using storage lifecycle policies or Vault option jobs. This, for the first time, is now done in a catalog-aware fashion and as a result the NetBackup environment is aware of all the data that it has available to it for recovery in the environment. This provides the backup administrator a single view for all backup images in the environment at both the primary site and the disaster recovery site.  This would also be true in a distributed environment where NetBackup would be aware of the data at both the remote or branch office sites and the central data center being used as a hub for tape consolidation.


The deduplication storage systems can then be leveraged to efficiently store and to perform WAN optimized replication, again in a totally catalog aware manner. All the functionality of the deduplication system remains, but it is simplified through the integration with the API. This API also ensures compatibility with future releases of NetBackup, something that is a concern when using scripts or internally developed applications as work arounds. 


The unified solution delivers key capabilities not previously available in a backup process-aware manner. Users are once again being asked to do more with less. The more separate processes can be combined into a single process the simpler the management becomes. By participating in the OpenStorage initiative, Data Domain has reduced the number of management panes. that must be looked through in order to manage the data protection process. When OpenStorage is used with an intelligent disk system, the storage can now be more easily shared across NetBackup Media servers.. The replication process is now managed and under the watch of NetBackup but is optimized using the replication technology of the deduplication vendor (if available). It is now aware of the backup data at the remote location and can leverage it in the event of a recovery.  Unification leads to simplification and decreases the likelihood of error.


When vendors choose to create open API’s that allow third parties to integrate and enhance the combined solution, everyone wins.

 

Tuesday, August 5, 2008

 
 
Made on a Mac

next >

< previous