Using Primary Storage Deduplication to Address the Data Affordability Gap
Using Primary Storage Deduplication to Address the Data Affordability Gap
Data is growing, it’s a fact of life in IT organizations. Budgets are growing too, and significant percentages of current and projected IT spending are being earmarked for more storage capacity. However, there are some details that should be understood. Year over year data growth is currently more than 50% (and increasing) while IT budget growth is less than 10% (usually much less). Even with continuous reductions in price per GB, the costs of new capacity required is growing a lot faster than the availability of funding. In response to this, storage manufacturers are using primary deduplication to address this Data Affordability Gap.
While data growth is well accepted in the storage industry, what may be surprising is that the rate of that growth is also growing. In physics, a change in velocity is called acceleration; you could say that data is also accelerating. Some of the drivers of this are the rise in social networking (at work), mobile computing, compliance, the increased use of images in everything (with more resolution), data mining and analytics. Again, the point here isn’t just to establish that data is growing, but that data growth itself is growing.
Tuesday, March 1, 2011
While growth is a fact of life for IT, technological improvements in hardware, software and systems are also expected. Moore’s Law has been holding up as the industry continues to enjoy the CPU performance increases that should enable innovation for many years. Storage device capacity has been on a similar improvement curve, pushing drive capacities up from 1TB to the 3-4TB range. This increase in storage efficiency is pushing per-GB costs down, estimated at 25% per year.
IT budgets are also increasing, finally. One projection puts increases in IT spending at 3% for 2011 (TechTarget IT Priorities Survey) an improvement over the past couple of years, but modest nonetheless. However, the impact this ‘new money’ has on alleviating the data growth problem is a little more complicated. Efforts to address data growth will certainly receive a portion of IT budgets, but not all of it. And some studies show that perhaps more spending should go to optimizing existing storage and not just buying raw capacity.
So where does this leave IT managers who are faced with finding half again as much storage capacity each year just to keep up? While budgets are growing and storage costs are dropping, the real question is “Will IT be able to cover the spread between projected budgets and projected costs of needed storage?”. For many the answer is “no”. Trying to cover an estimated 50% growth in data with a 25% decrease in storage costs and a small increase in IT spending leaves a significant gap.
This gap between projected storage capacity needs and the projected ability of businesses to afford that capacity is very real. Essentially, the typical company will be faced with an overtaxed infrastructure and an overstretched staff as they scramble to find ways to make ends meet. Just keeping up will mean shifting budget dollars to storage and away from investments in expansion, modernization and even people. With these constraints the data affordability gap will be a drain on short term profitability and longer term competitiveness.
To narrow the affordability gap, IT will have to work smarter, not just harder; this means intelligent storage optimization and not just brute force incremental storage capacity. One way to do this is through the use of primary storage deduplication, like Permabit’s Albireo Data Optimization Software which they OEM through a select number of suppliers. Technologies like these are being integrated into primary storage systems from established manufacturers. The compounding effect of optimization at the primary storage level, versus simply reducing data at the backup or archive level, can significantly reduce real storage consumption data center-wide.
To understand the compounding effect, think of primary storage as the root of a tree and the other levels as branches. Each data object in the root is propagated into the branches multiplying the total capacity of the enterprise ‘tree’. For example, the most critical primary storage objects are often duplicated for high availability reasons, doubling the capacity they consume. Similarly, more copies are made as all primary storage is backed up and most is copied (again) for transportation off-site - physically or electronically. In the workflow process, collaboration and revision create many more copies (although slightly different) of these primary storage files as do the adhoc backups that users take.
Personal behavior, like saving documents locally, carelessness about deleting old projects on shared storage and storing personal media files (audio, video) also duplicates many data objects. Snapshots (and clones) are used by many different applications, and while more efficient than full copies, still represent additional space consumed by primary data. Finally, test, development and support activities create still more copies. The bottom line is that the day to day operations of an enterprise generates a lot of duplicate data. When data reduction techniques are applied to one area, like deduplicating backups, the benefits are felt only on that ‘branch’ of the tree. When they’re applied to primary data, the benefits are compounded and affect the entire enterprise.
CAPEX and OPEX savings
These benefits come in the form of reduced storage capacity at all levels, primary, secondary, archive, backup, DR, etc. The result of data reduction is fewer spindles, arrays and controller upgrades needed, along with fewer switch ports, switches and switch upgrades. This translates into less rack space, less power distribution and cooling infrastructure and smaller data centers. On the operations side, fewer and smaller storage systems means less power consumption, less administration time and less resources spent on the ‘care and feeding’ of a growing storage infrastructure.
Addressing the problem, not the symptom
Reducing primary data consumption is the problem. Dedupe technologies, like those found in backup and archive systems, are really just point solutions that mainly address the symptoms by simply finding a way to put more data into the same sized container. Optimizing primary storage increases efficiency and reduces capacity consumption throughout the storage environment. Like minimizing expenses or maximizing profits, resource optimization is a fundamental business requirement that IT can undertake to address the affordability gap. Technologies like Albireo Data Optimization Software from Permabit, are becoming available from select storage suppliers and will make addressing the data affordability gap feasible. In fact, almost all of the major disk vendors are responding to this imperative with newly announced products, development projects or external technology acquisitions to address this need.
Keeping up with growing data has always been a fundamental IT responsibility and it’s not going away. IT budgets are increasing, but not enough to accommodate this projected data growth, even with reductions in storage cost and improvements in storage efficiency. This disparity between capacity required and the funding to purchase that capacity could threaten future profitability and competitiveness.
Reducing primary data storage needs through deduplication is an effective strategy that addresses the problem, not just the symptom. Storage optimization is a key component to the solution that helps satisfy this appetite for storage, without consuming the entire budget in the process. As an integrated part of primary storage systems, this technology can produce a compounded reduction in data throughout the environment and help shrink the Data Affordability Gap.
Permabit Technology is a client of Storage Switzerland
Eric Slack, Senior Analyst
Related Articles
Faster Primary Storage with Data Dedupe
Primary Storage Deduplication, Demand It
Dedupe Improves Primary Storage Efficiency
SMB NAS is Deduplication's Next Step
How Should Primary Storage Be Delivered
Storage Industry Consolidation & Dedupe
Primary Storage: Dedupe vs. Compression
Making Primary Storage Dedupe Safe
High Performance Primary Storage Dedupe
Automated Tiering or Disk Archiving?
Can’t Deduplicate Admin Workload
Managing VM Sprawl with Disk Archive
Optimization - the New Normal in Storage
The Foundation of Dedupe’s Next Era
Weaknesses of Deduplication Backup...