Which Primary Storage Optimization Strategy is Best?
Which Primary Storage Optimization Strategy is Best?
Primary storage is the most expensive but, also the most important storage in the enterprise. Reducing its cost, or at least curtailing the expense while making sure it performs well, is a key focus in many data centers. Suppliers have responded with several different methods to maximize the use of primary storage including thin provisioning, snapshots, clones, compression and deduplication. This choice has left many IT Managers overwhelmed, prompting the question “Which of these primary storage optimization strategies is best for our data center?”
In this article, we will review each of the above mentioned optimization technologies and compare them to one another. This should arm the IT Manager with the information to select the right optimization combination for their environment.
Thin Provisioning
Thin provisioning is the process of allocating capacity to a volume dynamically as that capacity is needed. This allows a storage manager to create a volume of the maximum capacity that an application could possibly use, eliminating the complex process of expanding that allocation in the future, but not incurring the expense of having that storage actually consumed. This is ideal for most application rollouts where the day-one capacity is far different than that capacity needed after three years of production, and where the actual capacity probably never reaches anything close to the originally projected capacity.
Without thin provisioning, the volume is hard-allocated, meaning that capacity is lost to the rest of the volumes even though it’s essentially unused. Unfortunately, other forms of optimization have limited value on these volumes. The unused space is essentially all zeros. In short, most of the other optimization technologies have to have real data to work on. While in theory zero’s can be compressed or deduplicated, the extra free space isn’t released to the entire storage system. The good news is that most modern day storage systems and operating systems support some form of thin provisioning and it has become a baseline feature when choosing storage software.
Snapshots
Most storage systems today track data at a granular, sub-file level - by the block or sub-block. This not only enables thin provisioning but, it also enables snapshots. Snapshots are primarily used for data protection but, they’re also the foundational capability for the next optimization technology - clones.
When a snapshot is taken, the active blocks on the storage systems are essentially frozen in time by being set to “read only”. New blocks or modifications of current blocks are tracked separately. This allows the storage system to present a volume as it looked when a particular snapshot was taken. This is more space efficient than making a bit-by-bit copy of a volume. And it’s also more time efficient, since most snapshots can be completed in seconds, since only the meta-data needs to be copied, not the entire blocks.
Clones
Clones are an advanced form of writable snapshots. They’re essentially a snapshot volume presented as a ‘real’ volume that can be changed or modified. Clones initially had limited value, used primarily for test/development applications. With the rise of virtualization, especially desktop virtualization, clones have immense value in reducing the storage footprint that these environments require. They can also help improve performance, since hundreds of virtual machine-based storage images can now be loaded into cache.
Deduplication
Deduplication is the process of identifying redundant data segments and only storing one instance of those data segments. Unlike the other optimization strategies, it provides efficiencies across volumes and in some cases across storage systems. While it rose to popularity as an enabler for backup to disk, thanks to solutions like Permabit’s Albireo, it’s now becoming a ‘must have’ for primary storage as well.
In the Albireo use case, an API set is embedded or integrated into the storage controller’s software. The storage controller uses the same techniques for deduplication that it uses to manage and track blocks in snapshots and thin provisioning. Permabit’s product handles redundancy identification and efficiently manages the indexing required to track identical blocks. Since much of the data handling is something that the storage controller does for snapshots and clones already, embedding deduplication is a natural fit, one that can realistically claim no performance impact.
Compression
While each of the above optimization techniques eliminates redundancy, compression is the only technique that actually modifies a file and looks for redundancy within that file. It literally changes the way a file is stored on disk and must have a special “reader” to enable this new version of the file to be used.
Compression does have the advantage of being able to optimize capacity even when there is little redundant information on the system. While it provides less macro efficiency, it provides that efficiency across all files, not just those with redundant data.
Which Technique Is Best?
Each storage optimization strategy has its strengths and weaknesses, but deduplication typically provides the broadest level of efficiency without performance loss. It can produce the capacity optimization of the other techniques and when coupled with them makes them more efficient.
While snapshots and clones provide capacity efficiency, they do allow for significant data growth as well. First, similar files within a volume are included as part of the snapshot, and redundant data within those snapshots across volumes are not eliminated. This makes the cloning process work harder because it has to track more blocks as well as consume more capacity.
For example, in most virtual server or desktop use cases clones can limit capacity requirements due to the similarity between images. Each virtual machine though, will make its own layer of customization. There may be a base Windows image that is cloned for all the windows servers. Then that image may be customized by the Exchange group as it builds its email infrastructure. Each of these changes are ”net new” but with a redundancy between them that only deduplication can eliminate.
Another disadvantage is that snapshots are volume specific, meaning they operate on a whole volume not just a specific file. For example, if a user wants to test a change to a specific file, instead of snapshotting the whole volume they will likely make a copy of just the affected files - and in most cases also forget to delete the copied files when the test completes. Snapshots would not produce a benefit here, but deduplication would identify these redundant files and only store them once. The user could make as many copies as they want and have virtually no impact on capacity.
Compression provides efficiency across a range of files, but does not eliminate redundant data between those files or across storage volumes. It would help with the above example of a user copying data but, depending on the file type, would likely not be as efficient as the total elimination of the redundant data. In the same way the cloning example would be optimized slightly by the reduction in the individual size of the net new files created or modified, but would not achieve the 100% elimination of redundant data.
Better Together
Deduplication delivers the most optimization ‘bang for the buck’ in most environments, especially with today’s virtual infrastructures. However, the best solution is to look for storage systems that leverage deduplication with one of the other strategies. These combined technologies can provide all of their capabilities for maximum optimization imapct. As stated earlier, this not only will improve storage efficiency but, in most cases, the combined use will improve performance by enabling storage resources, like cache and interconnects, to transfer and handle more data in the same space.
Related Articles
Permabit Technology is a client of Storage Switzerland
Friday, September 30, 2011
George Crump, Senior Analyst