Leveraging Deduplication for Disaster Recovery

 

Disk-to-disk backup with data de-duplication provides the broad data coverage needed for today’s disaster recovery plans to be successful. We will examine the traditional methods and explain how data de-duplication backup solutions like those from Data Domain and others fills the void left by them.


Depending on the region of the country that your data center is in, you tend to worry about a particular disaster. In the Gulf Coast its hurricanes, on the West Coast its earthquakes, in major cities possibly a terrorist strike. Yet there is one disaster that hits us all; blackouts.


As evidenced by the recent hurricane Ike that hit the gulf coast and caused thousands of businesses to be without power for weeks, weather can cause the most severe disasters and can be far more frequent than the headline grabbing major events. This is compounded by the fact that as you know data is growing on average of about 60% per year. The requirement to retain that data for longer periods of time continues to be an issue for many IT staffs, plus the outright cost to cool and power that storage is becoming a burden on IT budgets.


The traditional approach has been to replicate your primary data store to the DR site or to back everything up to tape, put it on a truck and send it to a vault. Replication of primary data provides rapid recovery but is not cost effective, nor is it storage and power efficient. Because of these cost concerns it is not deployed broadly across the servers in the data center; only the "chosen few" are protected. While the needed broad coverage is provided by tape based solutions, tape based backup provides cost efficiencies but is slow and unreliable to recover. A DR solution that you cannot count on is not a DR solution at all.


Disk-to-disk backup was supposed to fill this gap, but without some form of capacity optimization, it is not cost effective to store this data on disk long term, nor is it possible to replicate this data to another location. With the advent of a capacity optimization technique like data de-duplication, disk-to-disk backup provides a very acceptable disaster recovery speed for 90% of the servers in an environment while at the same time being simple, power and capacity efficient.


Implementing Broad Based Disaster Recovery


The primary method to recover data in the event of a disaster, even three years ago, was the classic tape and truck method. You send your backups to tape or maybe a disk cache first and then to tape. Those tapes are either copied or pulled and put in a truck and sent to a vault facility. Tape is cost effective, relatively power friendly (tapes on shelves don’t require power) and easy to transport off-site. As a result, the DR solution, while not reliable, was deployed broadly throughout the enterprise and virtually all the servers could be recovered given enough time and patience. The challenge with tape and the number one reason that it is being augmented or replaced by disk is well documented, but primarily focuses on speed and reliability of the recovery effort. These factors are deal breakers when it comes to data recovery in a disaster scenario. Another challenge with tapes is that like socks, they can be easily lost and may fall into the wrong hands. There seems to be an almost daily press release about hundreds of thousands of customers being exposed because of tape loss.


Replication


To the rescue has come hardware based replication solutions from primary storage solution providers or specific software solutions from replication software and backup software companies. These solutions attempt to provide continuous and near-instant data availability in the remote site, but at a premium price and maximum complexity. The primary challenge with replication of primary storage, whether it is a software based solution or hardware based solution, is that it is expensive, inefficient from both a capacity perspective and a power perspective, and it is complex to implement and maintain. Both require a 1:1 investment in disk at the remote site and while that is appropriate for critical servers, it does not make sense for the data center at large. Storage controller based replication uses the capabilities on the SAN RAID Array to replicate the data from the storage controller to another storage controller in the remote site. Storage controller replication has a limitation that all the storage must be from the same supplier and that all the data must be on the same storage network (either iSCSI, NAS or Fibre). The storage controllers at Site A and Site B “talk” to each other and as such must understand the same “language”. As a result, most storage based replication strategies are expensive because from a storage perspective, you are buying two of everything. In addition, storage controller based replication must have all data on that storage attached to that controller in order to be replicated.


Most customers still do not boot from a SAN, so all the information in locally attached servers is not protected to the DR site. SAN storage costs more than local storage, and it typically is not very cost effective to put all your data on the SAN.  Even if your storage vendor gave you the remote storage for free, it is still expensive. All that storage needs to be powered on and cooled. Because of these costs and complexity issues, only a few servers are protected in this manner and a need for broad based disaster recovery still exists.


Software based replication uses an agent on a host server or servers in the primary site to replicate to one or multiple servers in the DR site. While it allows you to put any storage in the remote site, it still has issues. Software based replication’s primary challenge is that you must install an agent on every server to be protected. If there are many such servers, this is not only an installation challenge but also a day-to-day management challenge. There are so many replication jobs occurring that making sure each one is working correctly can be very time consuming.  Another issue is software-based replication is very OS and OS version dependent. Typically in a mixed OS environment, the replication product from one vendor does not cover all OS’s. The larger issue with software based replication products is that they exact a toll on both the sending and receiving hosts.


Lastly, whether you choose software or hardware based replication you are creating a completely new infrastructure for data protection outside of the data protection path that you have right now...your backup infrastructure. This includes not only the physical investment in networking and switches, but also the mental investment of IT staff to check all these new processes to make sure they work and keep them working. If you could replicate or electronically vault your backup tapes as the backups were happening, then you could leverage your backup infrastructure and processes to create a very simple and cost effective solution. Almost all backup applications have the ability to create a tape clone, but the problem is that a WAN is too slow to feed tape drives in the DR site and there is just too much data to go across a WAN every night.


Disk to Disk Backup


As a result, most customers are looking to a disk-to-disk backup solution to solve this problem and replication is one of the most requested capabilities when talking to customers about potential disk-to-disk backup solutions. Instead of loading tapes on a truck, users want to have the ability to automatically have that data electronically vaulted to a remote site. While the disk can clearly handle the slow delivery speeds of the WAN, the problem is the very size of these backups. When a backup to disk is performed, you are essentially creating a net new file that represents all the files that have been changed. For example, it is not uncommon for a customer to have a 500GB or greater nightly backup. When you backup your data to disk you are creating a series of files...backup images...that total 500GB’s and you are doing this very rapidly, making it challenging if not impossible to replicate this information across even a

relatively fast WAN segment.


Disk to Disk Backup with Data Deduplication


To address this shortcoming is a class of disk-to-disk backup systems that provide a capacity optimization known as data de-duplication. These solutions provide very granular redundancy checking, looking for common byte patterns in backup data. The result of which is the similar files or backups (this week's full compared to last weeks) only need to have their differences stored and their similarities are only stored once, greatly reducing the total amount of disk storage required to store multiple backups. New incoming backup jobs are compared to the data already present on disk, duplicate patterns are not stored but instead pointers are made to the already existing data. Since most of the data in a full backup is identical to the last full backup, actual data growth on the disk is very small.


This changes how the disk part of disk-to-disk backup is used and its roll in disaster recovery. Before data de-duplication the backup disk was essentially a cache area, holding a backup job for a few days or maybe a week before the data on disk needed to be moved to tape to free up disk space for more incoming backup jobs. There was no viable way to move that data to an off-site location, so tapes still had to be created and put on a truck. In addition to not solving the disaster recovery via an electronic vault goal, it also does not solve one of the other main reasons customers buy disk-to-disk backup solutions, which is to do restores from disk. Under this scenario, this only happens if the restore request was from the last few days. Data growth on capacity optimized solutions is much more gradual, maintaining a backup on disk for months or longer is realistic, making it possible for most restores to come from disk.


Where capacity optimized solutions really shine is when the discussion of electronic vaulting or DR is brought up. Since the only data that is physically being stored is the actual changed bytes, only those changed bytes need to be replicated to the DR site. Replicating that data across modest WAN connections is now very viable. Once at the remote location, that data is stored on spinning active storage that is also capacity optimized. The cost on a per usable terabyte to power and cool capacity optimized disk is significantly less than traditional disk. In the event of a disaster, that data can be quickly recovered to the DR environment and you can be back in production.


With this solution, you are also leveraging your backup infrastructure and the processes you have developed to make sure everything has worked according to plan. Its simplicity comes from leveraging a process that you have used for years, not making you learn something new, but making what you have better.


With disk-to-disk backup solutions that provide data de-duplication you can implement a broad based disaster recovery solution that makes sure that ALL the data in your enterprise is protected, replicated to the DR site and most importantly available for immediate recovery.

 

Thursday, November 6, 2008

 
 
Made on a Mac

next >

< previous