Data Deduplication and Green IT
Data Deduplication and Green IT
The Green IT Challenge - The challenge with Green IT is calculating cost savings. Data centers for the most part lack the tools to accurately measure what wattage they are using now let alone how much they may save after a completion of a green project. In fact when surveyed at a recent VMworld session less than 2% of the respondents actually new what their data centers portion of the electrical bill was. Yet when that same group was asked if as the result of server virtualization had they increased power and cooling efficiency; over 96% said yes.
This reinforces the point that for today, most green projects that are deemed successful are done so more on obvious conclusions than actual measurements of WATTS saved. In the virtualization example, significantly reducing the use of server hardware may be enough.
Server virtualization has benefited organizations by delivering a power efficient solution so obvious that it does not have to be measured, it is seen.
The next step on many IT priority lists is to improve the backup process with the use of disk to disk backup. The challenge is that enhancing the protection of the server virtualization infrastructure and the rest of the environment by implementing disk to disk backup may require so much power that much of the server virtualization cost savings are lost to the enhanced data protection solution.
Making Disk to Disk Backup Green(er)
Disk to disk backup can only be so green. Drives have to spin, shelves have to be powered and controllers and network connections have to serve data. The key is to set reasonable expectations and try to make the disk to disk backup solution as green as possible.
The Green Disk To Disk Backup Challenge
Trying to do the impossible; improve the overall backup process by achieving three goals:
Improved backup / recover speed and reliability. Too many backups and recovery jobs fail. Disk is the solution for improving the odds.
Expand the number of servers covered by the DR process. Typically only mission critical servers have well designed disaster recovery. By leveraging disk backup replication you can extend the number of servers that can be recovered at the DR site.
Not significantly increasing the use of power and cooling. If possible actually decrease it. Power and cooling savings must be logically concluded by significantly reducing the amount of capacity required by the competitive options.
Traditional Solutions Waste (Power)
As referenced in our article on Disk to Disk Backup Basics, many organizations are looking to disk to disk backup as a way to improve the data protection process as well as leveraging that system to electronically off-site data to a remote location. The problem is that disk to disk backup seems to be in direct conflict with being more power efficient, it actually adds power consumption to a process that was power efficient (tape based backup).
Despite its multitude of weaknesses, tape has one advantage over disk solutions; it is very power efficient at first blush. A cardboard box full of tapes does not use much power. Tape however is not without its challenges; the environmental impact of tape, storage consumables, the fuel consumption for DR (truck, plane or both) and the need to store these tapes in a temperature-controlled facility all have an environmental impact.
When considering the purchase of a disk to disk backup solution, power efficiency is going to be a concern for most organizations. Traditional disk to disk solutions are anything but green. They require at least as much and in most cases more storage than the primary storage they are protecting, so they can house a couple of full backups and a weeks worth of incremental backups. For example if you have a 10TB full backup with a modest set incremental of 1TB (10% changes), this could amount to a 25TB disk investment (2 fulls, 5 incrementals). All this disk capacity would have to be purchased, powered and cooled…hardly green.
The longer the retention policy for these disk backup sets the more disk storage is required and the less power efficient the backup process becomes. Considering that some organizations have a goal of keeping backups on disks for months or even years, the cost to power and cool this storage, even if it were free to acquire, would be staggering.
DeDuplication: Green Disk to Disk Backup
There is a middle ground between disk and tape… a way to add or expand disk to disk backup without significantly increasing power consumption in the data center. A data deduplication system from companies like Data Domain can bring the advantages of disk to disk backup to the data protection process without significantly adding storage power consumption.
Data deduplication is a data reduction technique that compares segments of data being written to disk storage with data segments that were previously stored. If duplicate data is found, an additional pointer is established to the original data as opposed to actually storing the duplicate segments, thus removing or "deduplicating" the redundant segments from the storage system. As the unique segments are stored they are also compressed with a standard LZ compression that captures an additional 50% savings.
Reducing Watts per TB with Deduplication
The opportunity is to leverage deduplication’s capabilities to address power, cooling and space consumption challenges by optimizing disk capacity and storing more data in less space. Reducing the capacity of the disk to disk backup system lowers the overall drive count which lowers the number of systems that need to be written to, all of which adds up to increased power efficiency and decreased cooling costs.
It is not unusual for data deduplication devices to achieve a data reduction efficiency of 10X to 30X. For example a 20TB backup that was stored on traditional disk will typically have enough space for two full backups and at least a weeks worth of incremental backups, roughly 60TB’s (two 20TB’s fulls and 10 2TB incremental assuming 10% change). Although your actual storage ratios may vary, a deduplication system would typically store this data set in about 15TB’s.
Assuming a data deduplication efficiency of 10X to 30X the WATTS per usable TB on a data deduplication device typically in a range of 1.3 to 2.8 WATTS per usable TB.
Deduplication: Extends Power Efficiency to the Remote Site
Both tape and traditional disk have issues when extending power efficiency to the remote site. Tape as mentioned earlier still has to be transported to the remote site via truck or plane. It also has to be stored and managed in a way that it can be used to recover. Then there is the reliability and speed issues that plague tape and makes it undesirable as a first line recovery technology, especially in a disaster where recovery speed and time are critical.
Disk does not have the recovery and reliability issues with tape, however it does have a data transport issue. Without deduplication, the replication of backup data stored on a disk backup to the remote site’s disk backup is near impossible (the backup data set is too large each night) and even leveraging the replication of primary storage requires the exact capacity of storage in the DR site, doubling your power costs.
Deduplication systems replicate only the uniquely identified segments and then store only those segments at the remote location. Storage capacity at the remote location is reduced in the same manner that it is at the primary location. Additionally if multiple sites are replicating into a single DR site then data is deduplicated globally, further reducing capacity requirements and therefore further reducing power utilization at the DR location.
Lastly the same deduplication systems can cross replicate, still leveraging global data deduplication, allowing data at the DR site to be replicated back to the primary. This is ideal for organizations that have two or more data centers. The data centers can replicate to each other, only having to store unique data once.
Green Now
Most green projects are large undertakings that require the involvement of all levels of IT and take months if not years to fully implement. Implementing a deduplication system is something that can be done in days at a price point that is very affordable and power efficient while at the same time delivering the tangible benefits of improved backup process and disaster preparedness process.
Complete ROI
The pressure to drive out costs has never been greater. For an IT project to be undertaken it needs to deliver an ROI on multiple levels to be effective. Adding deduplication to the backup process is an example of a project that can do just that. Without changing backup solutions it can reduce the cost of storing backup data, reduce the frequency of moving data to tape, driving down media consumption costs and greatly reduce the cost of establishing a DR site.
Thursday, February 26, 2009