Archive or Backup


Traditional archive systems are best suited for unchanging data sets, typically reference data and formerly active data that’s ‘aged out’. Backup is best for dynamic data, or ‘working sets’ of databases or files. In general, backup is a more complicated process which inevitably includes the handling and storage of certain data objects multiple times. Archive typically involves storing an object once. In theory, it’s best to archive data first, then run the backup, since this reduces the amount of data that hits the backup system. This means removing reference or inactive data from the working data set and putting it into the archive.



Problem


But most organizations can’t easily separate these two data sets, especially active and inactive data. Backup data sets include files, databases, applications, metadata, essentially everything needed to restore a server or an application. Migrating them to an archive system would require pulling out the files that warrant being saved and physically moving them, a difficult set of decisions and a time consuming process.


The result is that companies are left with an inefficient combination of archive and backup systems or they’re using their existing backup infrastructure for long term retention - and paying too much. Unfortunately this often means simply keeping tapes longer and hoping they’ll be able to recover from them after years of storage. But a traditional archive system can’t replace the backup system, realistically, since it doesn’t interact with applications and servers to capture current data or provide efficient restores beyond a few discrete files. What’s needed is a method for cost-effective, efficient long term retention of these inactive backup data sets. The best case scenario solution would leverage the existing backup system since it’s already touching the active data and the search capabilities of current backup applications are good enough to enable retrieval of data from long term retention as needed.



Solution


Data Domain Archiver is a modular system that combines current backup with long term retention, consisting of an active tier and an archive tier of storage. In this way it’s able to transparently store inactive data long term and stay within the existing backup infrastructure. The active tier is essentially a Data Domain controller and disk shelves, with the same OS, management and single file system namespace that Data Domain systems have established as the standard of disk-based backup. The active tier acts like a traditional Data Domain system and provides in-line, deduplicated capacity for current backups, typically retained up to 90 days. The archive tier consists of independent, logical storage units – referred to as archive units - which connect to the active tier and store data long term. Like the active tier, the archive tier also leverages the Data Domain Data Invulnerability Architecture to ensure the integrity of backups.

Eric Slack, Senior Analyst

Briefing Report

EMC is a client of Storage Switzerland

As data ages out past the retention window established by policy for the active tier, the system moves these static data sets to the archive tier, where they continue to be accessible to the backup application. Then, as each archive unit fills up, subsequent data is sent to the next archive unit automatically and the first unit is sealed for fault isolation, but remains online for file retrieval.



Fault Isolation


The sealing process ensures that all snapshots, deduplication and file system metadata are included for that self-contained unit. This logical isolation means that each archive unit is protected from potential data loss or corruption that may affect another unit, a feature that’s essential for an archive. Also, in a data loss scenario, each archive unit is available for restore, independent from the rest of the system.



Cost Optimized Long Term Retention


The Data Domain Archiver can scale to a total of 24 disk shelves with 768 TB of raw capacity or 28.5 PB of logical capacity. With its single controller, the cost per GB decreases as capacity is added when the system grows. Deduplication in the archive tier is optimized for long term retention, different from that used by the active tier, and the system provides up to 9.8TB/hr throughput to meet the performance expected with Data Domain systems. The DD Archiver also supports the Data Domain Replicator and Data Domain Retention Lock software options.



Storage Swiss Take


Increasing regulatory and compliance requirements are forcing companies to keep inactive data for longer periods of time. In an ideal world, they would have data protection infrastructures with fully integrated backup and archive capabilities to do this. In the real world, most IT organizations use their existing backup systems. But traditional backup is inefficient for long term retention. The Data Domain Archiver addresses this reality and gives companies a way to ‘get from here to there’ - from current backup for active data to cost-effective archive for long-term retention. In theory, it may be better to archive before backup. In reality, archiving after backup is fine, with the right infrastructure.