Deduplication Means Affordable DR

 

Making Disaster Recovery affordable with Deduplication


IT Managers today have two conflicting goals; drive down costs yet increase the organizations ability to recover from a disaster. The combined goals can be met by using a single technology, deduplication, to not only reduce the cost of primary storage but to reduce the cost to establish and maintain a disaster recovery site.


The ability to implement a data center recovery strategy has become significantly easier over the past few years. Technologies like SAN replication and host replication are far superior to shipping backup tapes to a DR site and having to go through the time consuming process of recovering those tapes in the event of a disaster. 


Replication of online or primary data via SAN storage or via host based agents is expensive. With SAN storage it requires that a nearly identical storage platform be purchased in the DR site and that the storage replication software for the storage be purchased from the manufacturer, often an expensive add-on. These replication strategies either require expensive WAN bandwidth and consume either the processing resources of the storage or the host during the replication period. Because of the expense associated with replication of online data its use is typically restricted to only the most mission critical servers.


Tape on the other hand is very cost effective and is relatively simple to move off-site, but it can be expensive, requiring contracts with off-site storage facilities or a facility and inventorying system of your own. Tape is also broad in its coverage; all servers usually have their backups written to tape. Therefore in theory all their data is on tape and stored off-site. Tape’s Achilles heel however is its reliability and speed of recovery. The concern over whether or not the tapes will actually work and the speed of loading and recovery, in the event of a disaster, is a legitimate one.


Copying backup data across a typical WAN segment is not possible either. Many users assume because SAN or host based replication systems can accomplish this that a WAN backup strategy can as well. Primary storage can utilize a WAN segment because it only replicates blocks of data as they change and typically they run all day long so they have a 24-hour window to complete the replication job. Backup jobs on the other hand create full copies of data that has changed since the last backup and in the case of a full backup, they make a copy of all the data, regardless of whether that data has changed or not. They create these full copies as quickly as possible, in some cases creating hundreds of GB’s if not TB’s of backup images in an hour. A typical WAN segment would never be able to keep pace.


The expense of replication of the online storage solutions vs. the reliability and performance concerns of tape have left a gap for which disk based backup systems with deduplication are an ideal solution. Systems like those from Data Domain can reduce the cost of establishing a DR site and be as broad in their coverage as tape because these systems integrate into the existing backup strategy. The only change is the routing of backups to the data deduplication system and the responsibility of replicating that backup data to another deduplication system at the remote site is handled in many cases by the data deduplication system, removing one more thing that the system administrator has to worry about.


These solutions enhance the backup process by providing a disk based backup target that can optimize the data stored on it by a factor of 20X. Inline deduplication systems operate by comparing blocks of data about to be stored with data already stored on disk. If those blocks of data are already stored, a pointer is made to the original and the redundant block is not stored. The result, especially with backup data, is that much of the backed up data is redundant and as a result does not get stored. Since only these changed or new blocks get stored, only those blocks are replicated making data replication systems very bandwidth friendly.


Cost Effective DR


Data deduplication systems drive down DR costs in a number of ways when compared to the traditional solutions and when compared to tape, improve the ability to recover. Thus one can consider them a more cost effective solution.


Replication of primary storage to a DR site can be expensive. The system at the DR site typically must be from the same manufacturer, be very similar to the primary system and may cost about the same as the primary system. Most importantly there typically is no optimization of this data, so if there is 5TB’s of primary storage to replicate, 5TB plus room for growth must be available at the DR site. Even though this disk typically serves only to receive replication jobs, it still requires full power and cooling at all times.


Also SAN or host based replication of primary storage does little to add to the reliability of the backup process. It is a function that happens outside of the backup process and the backup software typically has no understanding of the remote replicated storage. The replicated storage is almost always a near real time image of the primary storage. A deletion or corruption at the primary site means that within seconds that deletion or corruption is replicated to the secondary site. As a result local backups must still be done and either a disk solution is purchased or backups are done as they have been traditionally done, straight to tape.


In contrast a deduplication system will first optimize and improve the local backup by providing a local disk target for the backups. This will improve the speed and reliability of the local backup and with deduplication’s ability to optimize data, weeks if not months of data can be very efficiently stored on a minimal amount of disk. Then by taking advantage of the fact that deduplication only stores unique data blocks, those changed blocks can then be efficiently replicated to another deduplication system at the remote site, often over existing network bandwidth resources.


The storage at the remote site is equally optimized and can store weeks if not months worth of backups as well. Additionally since these are backup data sets, the data is not real time and multiple points in time can be stored. For example if a database corruption has occurred, you can browse the prior backup sets to find a valid copy.


The replication process of the deduplication appliances can protect the backup data of all backed up servers, providing a broad coverage of the environment as opposed to the limited protection afforded by SAN or host based replication. The potential negative for a DR strategy leveraging deduplication systems is that in the event of an actual disaster, a recovery out of the deduplication’s storage to primary storage is required. While this may take time it is certainly faster and typically more reliable than a tape based solution. All data centers have servers of varying degrees of importance when it comes to recovery times. Depending on the recovery objectives the few hours that it may take to recover the critical servers may be more than adequate, or some of the recovery may be pre-staged so that only the most recently changed data needs to be restored.


For those servers that need quick recovery, a blended model between data deduplication and primary storage replication is ideal. This blended solution limits the size of the DR site’s storage by only replicating those servers that must be online within a few hours or less of a disaster and leveraging a deduplication system to protect the rest of the environment.


SAN or host based replication strategies also add complexity to the environment. Unlike backup this is a new and separate process that has to be established, implemented and managed. In addition there is also the complexity of managing the connectivity to the remote facility, which, especially in SAN based systems, is not straightforward. Data deduplication systems in contrast insert themselves right into the existing and likely well established backup process, making the implementation, and management less taxing on the IT staff.


When compared to tape, long thought to be the least expensive method of getting storage to a DR site, data deduplication systems may still have a price advantage; they certainly have a reliability and recoverability advantage. Tape however does need to be moved offsite and then also made available for pickup in the case of a local deletion or corruption. This often involves a pricy contract with a tape storage service that charges for pickup, storage and retrieval.


In contrast data deduplication systems can replicate their data to another site with little to no additional costs. Typically the WAN link is already well established and likely underutilized in the late evening or early morning when the replication job would occur. The movement of that data to the second site, in the case of a Data Domain appliance, begins the moment the system and the main site begins to receive data, meaning the data is secured off-site significantly faster then waiting for the next day when the tape pickup is made.


The key weakness is the reliability of tape for recovery. When disaster strikes the last concern should be whether the tape media is reliable and if the recovery will actually work. With deduplication systems data is written to RAID 6 based disk systems that also perform validity checks to confirm disk integrity over the course of time. In addition because of the speed of disk-based solutions, many customers can start using the ‘verify after write’ feature that most software provides but they had turned off because it was too time consuming for tape based environments.


Bottom Line


Deduplication systems are an ideal solution to either establish an affordable foundation for a DR strategy or an ideal way to broaden the scope of a current strategy. Utilizing a deduplication system as an integral part of a DR strategy will reduce the cost of the primary storage purchased at the remote site, reduce power and cooling costs of storage at the DR site by optimizing storage at that site and eliminate the tape transport costs common with tape based DR strategies.


Deduplication systems can dramatically reduce the cost of DR when compared to primary storage replication solutions and dramatically increase the chances for 100% recoverability when compared to tape based strategies.

Wednesday, March 18, 2009

 
 
Made on a Mac

next >

< previous