The Role of Compression vs Deduplication in Primary Storage
The Role of Compression vs Deduplication in Primary Storage
Achieving greater storage efficiencies is a top project for most IT professionals in 2009. Deduplication has garnered much of the attention and its importance in the data center will certainly continue, but for many file-based primary storage applications compression, not deduplication, may be of greater value for most environments and is an alternative that should be explored.
To understand when to use compression or deduplication or both, it is important to understand the characteristics of primary storage and the inherent limitations of deduplication in reducing primary data sets.
Limitations of Deduplication for Primary Storage
Let’s start by looking at the characteristics of primary storage. Typically, primary storage should not have a lot of duplicate data on it. In fact, if there is a sizable amount of duplicate data a case could be made that there is a storage management need, not a storage efficiency need. Primary storage contains four types of data:
•First, and usually a priority, is structured or semi-structured data like that found in databases and messaging systems.
•Second, there is file data, often found in home shares created by office productivity applications.
•Third, there is typically some kind of image data that is particularly important to the enterprise. These are document scans or industry specific data; for example for oil and gas companies it might be SEG-Y data (the preferred format for seismic information).
•Finally, there is OS data that is installed in support of the operating systems used by servers.
Thursday, May 21, 2009
From a data reduction perspective databases, general file data, and image data are inherently unique and/or are randomly accessed, and therefore see limited storage efficiency gains because of deduplication. The exception is OS images that are created in virtual server environments. These files are essentially boot images, are accessed infrequently, and are highly redundant. This is why most of the focus of primary storage deduplication vendors to-date has been on OS files in virtual server environments.
From a storage performance perspective primary data is updated frequently and requires fast access which means administrators have to be extremely cautious to avoid the performance impacts of deduplication, by insuring that deduplication is applied only to the appropriate data. Few deduplication solutions automate the process of data selection thus in most cases it is manual, costly, and prone to error. Without automation the cost of identifying this data typically outweighs much of the benefits that typical organizations would gain from deduplication.
The ROI of deduplication assumes the investment in processing power and administrative overhead is outweighed by reduced cost of storage created by highly redundant data. In backup disk storage, where each weekly full is nearly identical to the last, the payoff from deduplication has proven significant and worth the investment. In archive disk storage the payoff may be less dramatic but may be justifiable, especially when considering that typically there is less concern about performance impact. The performance and access characteristics of primary storage and the lack of sufficient duplicate data typically negates the value of deduplication.
The Benefits of Real-Time Compression for Primary Storage
Alternatively for primary storage, real time compression from companies like Storwize may deliver a greater return on investment. Real-time compression utilizes an appliance that sits logically in front of NAS devices and compresses everything going in and out of the system in real-time. Everything means everything; databases, messaging systems, images, user data and OS images.
From a data reduction perspective this is very compelling because in almost every test of all data types, even data that is pre-compressed like images and office data, administrators will see measurable data reduction from real-time compression. So, unlike deduplication, real-time compression is ideally suited for primary storage because it achieves data reduction across every file created not just those files with duplicate data.
From a storage performance perspective, real-time compression delivers unique value. As shown in numerous tests, real-time compression, unlike deduplication, typically enhances read and write performance of your file-based storage. For more detail on this topic check out our recent article "Data Reducing Oracle".
Because real-time compression does not affect performance this also simplifies the implementation and enhances ongoing administrator productivity. Administrators don’t need to waste time and money determining what data can and can’t be optimized, which is a tangible concern with deduplication. This is a critical but often overlooked side effect of attempting to deduplicate primary storage as administrators need to constantly analyze what data should or should not be deduplicated.
Combining Real-Time Compression and Deduplication
As described above real-time compression and deduplication can both play a major role in a data reduction strategy. Most importantly the two technologies are complementary and should be deployed together.
Real-time compression reduces the data payload of every file throughout its lifecycle. In addition, depending upon the vendor solution, compressed data can subsequently be deduplicated. This has significant cost and performance benefits because deduplication can be achieved in less time and less capacity further enhancing backup efficiencies. To learn more about the complementary benefits of real-time compression and deduplication check out our "Storage Optimization Deduplication vs. Real Time Compression".
If you are looking to maximize your savings from data reduction, you should consider real-time compression. It is an ideal technology suitable for all data sets throughout the life cycle, from very active database environment to the data being pushed off to an archive or backup set. Furthermore, real-time compression and deduplication should not be looked at as competitors but complimentary technologies that when used together deliver maximum storage efficiency and as importantly enhanced system administrator productivity.
Related
Storage Switzerland
Storage Optimization: Deduplication compared to Compression
March 24th, 2009
Storage Switzerland
Instant ROI with Real Time Compression
January 28th, 2009
Storage Switzerland
September 1st, 2008
Available Now Webcast on Primary Storage Optimization