If deduplication is successfully integrated at the primary storage tier then, as that data moves around the data center (between different tiers of storage or processes), its efficiency is compounded as less data has to be placed in flight at any given point in time. This overall data reduction represents a fundamental improvement for every other component in the data path, including backup infrastructure, off-site replication, networks, etc.


The challenge for each of these larger companies is that they now have or are planning to have multiple tiers of storage products with different architectures to meet various storage needs within the data center. These storage products can range from high performance or high capacity systems, to backup and/or archive systems. This includes multiple primary storage products that are essentially incompatible with each other from a data management perspective. IT administrators will want to move data through these different storage types as its use profiles change, as it ages or to protect it. They will want to have that capacity optimized as it exists on each system and as it moves between the various systems in the environment.


Currently, if any of the storage systems within a company’s portfolio of products has deduplication they’re ‘siloed’ into a single platform. Movement between these types of storage would require that the data be re-inflated or ‘un-deduplicated’ as it’s copied across the network that connects the two storage systems. If the receiving system has its own storage optimization technology then that data must be re-processed and deduplicated again, wasting processing resources and reducing deduplication efficiency, since dedupe data block comparisons cannot be shared across platforms.


Backup deduplication has already raised performance concerns, imagine migrating a whole volume of information from one storage system to another. The ingestion of the data and its subsequent deduplication could swamp the receiving storage processor. If the secondary target does not have deduplication then the original capacity savings is lost, which of course reduces efficiency and complicates storage management considerably.


In short customers want a single deduplication method that works across platforms. It’s the only logical way to leverage deduplication so its full advantages can be realized. The challenge for each of these storage companies is figuring out how to deliver that capability to their users. The answer could be a deduplication API set like Permabit’s Albireo engine that we discuss in our article “Storage Vendors, The Stakes Are Raised“.


While most larger storage or systems companies may have the ability to develop their own deduplication solution for one platform, they likely don’t have the time or possibly the skill set required to move that deduplication technology across all their storage platforms. The situation is worse for smaller vendors which, while they have few storage platforms, are typically more constrained on developmental resources.


The advantage of an API-based deduplication engine is that it allows for rapid adoption of deduplication technology across all of a vendors’ storage platforms. Once that adoption has been made, cross platform storage efficiencies should be achieved, allowing for a ‘dedupe once’ strategy. This also means that future acquisitions could be quickly integrated into the deduplication plan of the buying company. Further, server vendors could benefit in particular by integrating the deduplication API at the server level, allowing deduplication of local storage in addition to networked storage.


Integrating a deduplication API may be even more critical to the smaller company. Not only does this allow them to quickly become competitive with the larger storage and systems vendors, it may also make them more attractive to certain buyers. If a larger company is interested in a smaller company’s storage platform, one that’s already compatible with the larger company’s deduplication strategy, it makes the acquisition even more attractive.


Right now establishing a deduplication strategy is potentially more important for storage and systems vendors than it is for end-users. Being able to articulate a real end-to-end deduplication strategy to current and potential customers gives a vendor an enormous strategic and financial advantage. Being able to further claim that any new acquisition will be rapidly integrated into that technology makes that company’s offering even more compelling. It also has a significant loyalty effect as an end user is going to be less likely to want to bring in a new storage vendor if it breaks their end-to-end deduplication strategy.

Permabit Technology is a client of Storage Switzerland

George Crump, Senior Analyst