Real-time compression is an inline storage optimization technology often implemented on an appliance that’s commonly deployed into NAS environments. Logically the appliance sits in front of the NAS head, processing all data coming into and out of the NAS through the real-time compression technology. This produces a minimum 50% savings in capacity upfront and a ripple effect throughout the other storage tiers that can be several times that amount. As a result it is not necessarily the capacity savings that leads to the cost and competitive benefits, but the overall increase in efficiency. Before efficiency can have value, though, the number-one concern has to be addressed: performance impact.



Performance Impact


Anytime a technology is offering to operate inline there is an immediate question about performance impact. Compression is a well vetted technology and is used in almost every aspect of the data center. While most are familiar with the more typical consumer implementations of compression (like ZIP) there are much more powerful enterprise implementations, like those used by IBM in their Real-Time Compression solution, which leverages advanced algorithms running on purpose-built hardware.


Enterprise class compression algorithms can now run in real-time as data is being accessed or stored, with little to no performance impact. In test after test (including databases), and customer interview after customer interview there has been a consistent validation of this result. In fact, most cases document a performance improvement of a few percent. In a TPC test performed in conjunction with IBM, the Real-time Compression technology was able to reduce disk I/O, giving back CPU cycles to the array. Additionally, the compression ratio lead better cache utilization which in-turn was able to provide faster response times to the application.


(For an actual copy of the test, please see the IBM Real-time Compression Validation report at: http://www.realtimecompression.com/Library_WP.asp)


While real-time compression certainly does introduce some latency it also increases the performance of everything else. The I/O path from the drives that data is stored on becomes at least 50% more efficient, since twice as much data can now travel along it in the same instant. The storage controller and its internal cache now have to process and store half as much data, making data output to the network 50% more efficient. The net result is that while the compression engine itself adds some latency, every other link in the storage I/O chain becomes more efficient, since those components are effectively processing twice as much data as before with the same amount of effort.



Increase Efficiency Not Increase Capacity


The fact that performance using real-time compression is at least equal to that of data that is stored uncompressed, is a prerequisite to the value of the technology. In fact, we have learned that the most valuable aspect of real-time compression isn’t that disk utilization decreases by at least half, it is the increase in overall efficiency that real-time compression has on the entire environment.


Efficiency is a direct result of where the optimization happens with real-time compression; the point of creation. The moment a file is created it is stored in a highly optimized format and remains in that format until it’s accessed by a user or application. What this means is that everything else that interacts with that data also becomes more efficient. The size of available storage controller cache is now 2x more efficient. Snapshot reserve space is now reduced by more than 50%. Database dumps and extra data copies are all 50% more efficient. The replication process now uses 50% less bandwidth. The backup disk and tape targets now use 50% less capacity. The network that backup travels over is now 50% more efficient. Even deduplication has been tested to be more efficient if the data is in a pre-compressed format. This ripple effect leads to cost savings throughout the data center and to increased utilization of existing assets.



Real-time Compression vs. Deduplication


As stated in the opening, deduplication captures much of the attention when storage optimization is discussed. The reality is that on the active data set, (that data which is presently being used), compression may deliver a better ROI. First, for deduplication to be effective there must be redundancy in the data set. While this is bound to occur, when compared to backup, (deduplication’s pinnacle use case), the incidence on primary storage is considerably less. Then, factor in that most primary storage systems already have features that reduce data redundancy, like snapshots and writable clones, and the realistic return on the deduplication investment is greatly reduced.


While it does offer a lower savings on a percentage basis, the fact that real-time compression works on all data, not just redundant data, can make the net return significantly higher. As stated earlier, real-time compression makes other data reducing processes, like snapshots and writable clones, more efficient. But this doesn’t apply to deduplication, since in many cases it would have to be ‘un-deduplicated’ as it moved between processes.


The reality is that this is not an “either - or” situation. Real-time compression can work in conjunction with deduplication, making the deduplication engines more efficient just as real-time compression does all the other data services mentioned. In short compressed data can then subsequently be deduplicated. This is especially true with primary storage deduplication since most of those products do not have compression capabilities of their own. The decision point then is which technology to use first, something that’s dependent on the type of data in the environment. The best strategy is to use the one which will bring the highest level of return across the broadest data set. If there is highly redundant data on primary storage, then deduplication may be the place to start. If there is only a small amount of redundant data, but plenty of unique data, plus data services (snapshots, replication jobs, writable clones and backups) then compression may be a wiser choice.


Real-time data compression is a seamless way to increase not only the capacity of the data center's NAS environment but also its overall storage efficiency throughout the data lifecycle. It has proven to address performance concerns so these gains can be had without a negative impact. The result of real-time data compression is reduced storage consumption, increased data efficiency and potentially, even better performance.

George Crump, Senior Analyst

Storwize is a client of Storage Switzerland