Can’t Deduplicate Admin Workload
Can’t Deduplicate Admin Workload
Deduplication can improve storage density for a backup system, since it can put more backups into a given amount of physical storage. It simply gives you a way to handle more data – to put ’10 lbs of backups into a 5 lb bag’. It can give you the backup space for more protected primary data and may allow you to postpone some backup storage purchases. But the amount of data being backed up remains unchanged, as does the amount of data residing on primary storage and the amount of data traversing the networks.
Backup deduplication can reduce some of the hardware costs associated with the typical growth in backup data. But, it does nothing for the other costs associated with the same larger data set. As data continues to expand, you’re faced with CapEx costs related to this growth:
Friday, August 28, 2009
• Primary storage hardware costs – more disk capacity to support more data volume
• Network hardware costs – to support bandwidth and nodes associated with more data volume
• Backup hardware and software costs – more licenses and more server CPU, RAM and disk space to support more data volume
Can’t Deduplicate Admin Workload
In addition to hardware and software costs, increasing data volume brings increases in OpEx costs – or the administrative time to design, implement and support this infrastructure. Deduplication of backup data doesn’t reduce the tasks associated with the ‘care and feeding’ of a given primary storage data set. These include:
• Primary storage admin costs to support more hardware and software
• Backup admin costs to support more data and larger file systems with millions of files
• Data retention and retrieval – long term protection and retrieval of data, usually via the BU application
• Situational or Contingency admin costs – data management costs of system failure, compliance and recovery
There’s no getting around the fact that more data means more work for the people who keep the storage, the applications and the networks running. And, using backup deduplication does nothing to reduce the overall volume of data residing on primary storage. Even though this data is not being touched by their owners, the IT staff is still interacting with it. Deduplication in the backup system addresses a symptom, but not the problem. The problem is too much data.
Enterprise Archive Reduces Data – and Admin Workload
However using an enterprise archive system like that offered by Permabit can reduce the data residing on the primary storage system. This in turn will reduce the amount of data that’s stored in backups (like deduplication), and can postpone backup storage purchases. In fact, enterprise archive systems include technologies to optimize their storage efficiency as well - like deduplication and compression - which further increase this data reduction and CapEx savings.
But, reducing the amount of data residing on primary storage also reduces the administrative (OpEx) costs associated with all that data. Once archived, the data is recorded once (or more times if desired for disaster recovery) and kept for perpetuity, ready for retrieval when needed – but still protected like primary storage. Looking at the OpEx costs listed above, we can see how an enterprise archive system and the overall reduction in primary data stored, can reduce admin workload.
Primary Storage
The care and feeding of primary storage arrays in the modern IT environment involves a number of tasks, depending upon the size and sophistication of the particular environment. Physical disk capacity must be provisioned to the storage array and allocated to the applications that need it, when they need it. This job may be simplified with more sophisticated storage virtualization and/or thin provisioning software, but still involves IT admin cycles. CPU resources serving this storage must be load balanced to optimize performance and support dynamic application needs for capacity. Business details like charge backs and documentation of regulatory and process compliance must be managed and reported.
When the physical capacity is no longer adequate, more storage must be added. These tasks include the procurement process (including cost justification), installation and integration, if new arrays are added to the environment. Also, data sets must be arranged appropriately on new storage to optimize performance and utilize new resources.
Data thats archived is removed from the primary storage infrastructure, reducing the volume of data thats driving admin functions like provisioning, load balancing, reporting, compliance, etc.
Also, data reduction creates headroom for data growth in other applications without adding more storage, eliminating another entire segment of admin work associated with storage procurement and implementation.
Backup Administration
All primary data is touched by the backup system. Even if a deduplication appliance is used to physically store more backups, the volume of data associated with each client remains the same. This means the licenses for each client and the licenses required to support those clients (media Servers, storage capacity licenses) also remains the same. As data grows this backup software infrastructure will also grow, adding to the admin workload to install, configure, operate, patch and troubleshoot a larger system.
When data is archived, it’s removed from the ‘backup stream’, or the data that’s subject to regular backup activity. Unlike deduplication for backup, this results in an across-the-board reduction in backup hardware and software administration, including operation, configuration, patching, expansion, troubleshooting, etc. It also reduces network administration as it reduces backup traffic on the networks.
Data Retention and Retrieval
Usually, data that is backed up to a deduplication appliance is left there for long-term storage, while the original copies are eventually discarded. This may reduce the admin effort needed to manage the primary storage and the backup processes for this data, but it adds another task. This data must now be retrieved using the backup system. Once its no longer available as a primary storage file share, retrieval becomes much more complex. Storing data long term with a deduplicated backup appliance forces the admin staff to restore entire backup data sets in order to access the specific files needed.
An enterprise archive stores data like a file system - accessible to the file level and ready for instant retrieval. This significantly reduces the admin time spent searching for and restoring specific files for regulatory or compliance reasons, improving your ‘litigation readiness’.
Situational and Contingency Administration
IT people also spend time on events that are unplanned, like emergencies and special requests for data (regulations, research projects, litigation, etc). More primary storage capacity means more opportunity for system failures and more work for system administrators. More data kept long term on backup appliances means more requests to retrieve that data (using the backup system) to support business and compliance needs. Reducing primary storage with an enterprise archive can allow you to retire older storage assets and reduce the workload associated with keeping those older systems up and running.
Archived data doesn’t consume primary disk space or network bandwidth and can reduce the admin time required for ‘care and feeding’ of that storage capacity. Being a repository for less-frequently accessed files, archives don’t have to support the applications that drive the dynamic nature of primary storage and its high operational expense.
Archived data doesn’t need to get backed up so it reduces the backup infrastructure and the admin time required to run it. Built on advanced redundancy and parity architectures, enterprise archive systems can survive multiple, simultaneous component failures and uncorrectable disk errors without data loss. They also rebuild data sets many times faster than standard RAID technology. In addition, enterprise archives utilize WORM technology to prevent file changes, (data ‘lock down’) and AES Encryption to further safeguard these long term data for compliance and security reasons.
Summary
Backup deduplication is a powerful technology, one which delivers significant benefits in the right environment – data backup. It can enable you to provide backup space for an increasing amount of primary data, as it reduces the amount of that data actually recorded during the backup. However since it does not reduce the primary data set, backup deduplication does nothing to reduce the other storage costs, like administrative costs, associated with growing primary data.
Enterprise archive systems can reduce this volume of data resident on primary storage by providing a storage optimized, secure, searchable file system for files that are no longer dynamic, but must still be accessible. These systems can also eliminate the need for backup of these files and employ WORM and encryption technologies to further protect and secure them for long term storage.
Through the simple reduction of primary storage, an enterprise archive system can also enable you to provide backups for a growing primary data infrastructure and also postpone storage purchases for backup. It also reduces all the other costs associated with that primary storage, most notably, the cost of administration.
Eric Slack, Senior Analyst
Related Articles
Faster Primary Storage with Data Dedupe
Primary Storage Deduplication, Demand It
Dedupe Improves Primary Storage Efficiency
SMB NAS is Deduplication's Next Step
Primary Storage Dedupe Addresses Data Gap
How Should Primary Storage Be Delivered
Storage Industry Consolidation & Dedupe
Primary Storage: Dedupe vs. Compression
High Performance Primary Storage Dedupe
Automated Tiering or Disk Archiving?
Global Healthcare Leader - Disk Archive
Optimization - New Normal in Storage
Managing VM Sprawl - Disk Archive
The Foundation of Dedupe’s Era
Weaknesses of Dedupe - Retention