Data Domain, one of the earliest suppliers of deduplication and to leverage deduplication to replicate backup data, has now enhanced their deduplication capabilities by increasing the speed at which the DR site is created and increasing the number of sites that can fan-in to a DR site.


Sometimes lost in the deduplication debate is the simple facts. Data Domain has 8,000+ units in the field of which 65% are using their replication option. Part of the reason for the success of the product is it simplicity. The replication feature is relatively easy to implement compared to other deduplication products. This is because those products don't really have their own replication capability; instead leveraging the replication capabilities of the array. This means dealing with the complexities of the SAN and the array software.


The other big advantage Data Domain has is time to DR, meaning how quickly is your disaster recovery site fully viable with the latest copy of data. Data Domain uses inline replication, and probably nowhere is it more valuable than when trying to quickly update a DR site. As we discuss in our Deduplication Buyers Beware Article, with inline deduplication blocks of data can be sent to the DR site as soon as they are written to the local site. In contrast post process systems must wait for the backup jobs and in some cases the whole process to finish prior to the replication to the DR site can begin.


The whole purpose of replication of backup jobs is to electronically bring the DR site up to date as quickly as possible. If it takes longer to update the DR site than it would take to have a truck deliver a box of tapes, what is the motivation for change?


The first part of Data Domain’s update is a new feature called Collection Replication. With Collection Replication, Data Domain further shrinks the time to DR bar. It is a tuning option that you set within the software and enables even large deployments that have hundreds of millions of files to replicate to their DR at high speed, taking full advantage of available WAN bandwidth. Many replication systems replicate this data at the individual file or block level, creating a very chatty replication session that does not make effective use of the available bandwidth.


Thanks to Collection Replication, a pair of Data Domain systems across a WAN can now be full system mirrors at high speed, replicating all system-wide changes across multiple input protocols just after inline deduplication storage, so files can be restored from the replica as soon as possible after being backed up.  This enables wire-speed transfer of deduplicated and compressed data on a 1 Gb Ethernet link, for up to 21 TB/hour of backup image transfer throughput.


With this capability backup data is sent to the deduplication system and before data is written to the system it is deduped and compressed. The data is then stored in 4.5MB containers. Replication of these containers, as opposed to individual files or blocks, results in a less chatty replication session, making better use of WAN bandwidth.


The second update in the replication software is the increase of the fan in ratio. Data Domain can now support a 90:1 fan in rate. That means 90 remote offices can leverage the same Data Domain target in the data center or DR site. This brings greater cost amortization and simplification by further minimizing the number of systems that need to be managed.


Electronic updating of their DR data set is one of the top reasons customers cite for implementing a disk to disk backup strategy. Deduplication enables the technology, being able to finish DR site updates more quickly and the ability to support a greater fan in of remote offices makes the technology all that much more attractive.

Briefing Report