From a data reduction perspective databases, general file data, and image data are inherently unique and/or are randomly accessed, and therefore see limited storage efficiency gains because of deduplication.  The exception is OS images that are created in virtual server environments. These files are essentially boot images, are accessed infrequently, and are highly redundant.  This is why most of the focus of primary storage deduplication vendors to-date has been on OS files in virtual server environments.

From a storage performance perspective primary data is updated frequently and requires fast access which means administrators have to be extremely cautious to avoid the performance impacts of deduplication, by insuring that deduplication is applied only to the appropriate data.  Few deduplication solutions automate the process of data selection thus in most cases it is manual, costly, and prone to error. Without automation the cost of identifying this data typically outweighs much of the benefits that typical organizations would gain from deduplication.

The ROI of deduplication assumes the investment in processing power and administrative overhead is outweighed by reduced cost of storage created by highly redundant data.  In backup disk storage, where each weekly full is nearly identical to the last, the payoff from deduplication has proven significant and worth the investment. In archive disk storage the payoff may be less dramatic but may be justifiable, especially when considering that typically there is less concern about performance impact. The performance and access characteristics of primary storage and the lack of sufficient duplicate data typically negates the value of deduplication.

The Benefits of Real-Time Compression for Primary Storage

Alternatively for primary storage, real time compression from companies like Storwize may deliver a greater return on investment. Real-time compression utilizes an appliance that sits logically in front of NAS devices and compresses everything going in and out of the system in real-time.  Everything means everything; databases, messaging systems, images, user data and OS images.

From a data reduction perspective this is very compelling because in almost every test of all data types, even data that is pre-compressed like images and office data, administrators will see measurable data reduction from real-time compression. So, unlike deduplication, real-time compression is ideally suited for primary storage because it achieves data reduction across every file created not just those files with duplicate data.

From a storage performance perspective, real-time compression delivers unique value.  As shown in numerous tests, real-time compression, unlike deduplication, typically enhances read and write performance of your file-based storage.  For more detail on this topic check out our recent article "Data Reducing Oracle".

Because real-time compression does not affect performance this also simplifies the implementation and enhances ongoing administrator productivity. Administrators don’t need to waste time and money determining what data can and can’t be optimized, which is a tangible concern with deduplication.  This is a critical but often overlooked side effect of attempting to deduplicate primary storage as administrators need to constantly analyze what data should or should not be deduplicated.

Combining Real-Time Compression and Deduplication

As described above real-time compression and deduplication can both play a major role in a data reduction strategy.  Most importantly the two technologies are complementary and should be deployed together.

Real-time compression reduces the data payload of every file throughout its lifecycle.  In addition, depending upon the vendor solution, compressed data can subsequently be deduplicated.  This has significant cost and performance benefits because deduplication can be achieved in less time and less capacity further enhancing backup efficiencies.  To learn more about the complementary benefits of real-time compression and deduplication check out our "Storage Optimization Deduplication vs. Real Time Compression".

If you are looking to maximize your savings from data reduction, you should consider real-time compression.  It is an ideal technology suitable for all data sets throughout the life cycle, from very active database environment to the data being pushed off to an archive or backup set. Furthermore, real-time compression and deduplication should not be looked at as competitors but complimentary technologies that when used together deliver maximum storage efficiency and as importantly enhanced system administrator productivity.


Storage Switzerland

Storage Optimization: Deduplication compared to Compression

March 24th, 2009

Storage Switzerland

Instant ROI with Real Time Compression

January 28th, 2009

Storage Switzerland

Data Reducing Oracle

September 1st, 2008

Available Now  Webcast on Primary Storage Optimization

Register Now