IT Discovery Enables Data Reduction
IT Discovery Enables Data Reduction
Data is exploding and storage growth is straining the infrastructure to keep up. Frequently, this growth is done in an ad hoc fashion. And as a result, companies often over-buy storage instead of organizing data, managing existing capacity and buying what they
Tuesday, December 8, 2009
really need. This inefficient, but short-term cost-effective practice is driven by the continual drop in price per GB of disk storage (Moore’s Law for storage). Content indexing through the use of IT Discovery systems provides a more educated method for organizing corporate data so that it can be managed efficiently. This can result in an overall reduction in primary data and related costs, as well as the creation of a corporate asset in the increased access to historical data.
Data growth is a result of several factors. One is organic business expansion, or the normal increase in files and information associated with an increase in business levels. This can also include mergers, acquisitions and business unit consolidation. Another source of increased data is duplication, coming mostly from backups. Backup systems typically take periodic full sets of data, for DR purposes or archives, and save them. Since most of these data sets don’t change (the 80/20 rule), the majority of these copies are redundant. Finally, recreated data is another source of duplication. This is data that has already been saved but can’t be found or accessed.
So much Data, so little Time
Keeping up with data growth requires more storage, and adding storage capacity requires planning. These plans usually include the examination of existing data sets to clean out worthless data, organizing what’s kept and migrating static data from backups to an archive. Clearly, adding storage capacity has a significant administrative component. But with the cost per GB of disk storage dropping and workloads on IT increasing, the simple solution is to just buy more storage and skip the planning and organization steps. After all, if it’s cheaper to ‘build a bigger garage’ than to clean it out, the prudent decision would be keep building garages.
Costs of Data Growth includes Disorganization
Adding storage in this fashion produces disarray in the storage ecosystem. When capacity is added without adequate infrastructure or planning, the result is a data system that is disorganized. The effect of this disorganized growth, even if it’s acquired at shrinking price points, is an increase in Total Cost of Ownership (TCO) - both ‘hard’ and ‘soft’ costs. Disorganization and ad hoc storage growth also makes budgeting for storage more difficult.
Hard costs refer to CapEx items like primary storage hardware, software, backup systems, network gear, etc. Another hard cost is contracted services, like eDiscovery engagements incurred when the legal data can’t be found. Also included are OpEx costs like administration, power, cooling, facilities and maintenance on CapEx items. Soft costs are things like loss in productivity due to poor data access, or applications being affected by storage availability problems. Recreating data that’s inaccessible would also fit in this category, as do the costs associated with missed market opportunities.
Risk is the other major soft cost, like the risk of data loss or downtime due to growth outpacing data protection systems. Another significant concern is the security risk of having unstructured data scattered across the network, unaccounted for. This can be data that’s sensitive to the business or Personally Identifiable Information (PII) which may be governed by privacy laws. Legal issues due to the inability to produce needed data objects are also a real risk in disorganized systems.
The only real solution to the organization issue involves reducing the amount of data that’s actually on the primary storage system. This means eliminating duplicate data, moving lower value data to appropriate storage tiers, and archiving static and backup data. Technologies like deduplication can offer temporary relief, but they don’t address the long-term problem. The issue is too much data on primary storage systems, not an inability to store more backups. Real data reduction requires organization and content knowledge of this data and, unfortunately, most IT organizations don’t have the tools to accomplish these tasks.
Content Knowledge
Companies do have an option. They can leverage the content indexing ability of IT Discovery tools, like Index Engines, to know what they have and organize existing data sets, and to keep them managed as they grow. IT Discovery systems are purpose-built hardware and software tools that integrate into an organization’s infrastructure and create a full content and metadata index of all enterprise data assets. They can access files on network file servers, Network Attached Storage (NAS) devices, desktop computers and hard drives, in all major file formats - including email. They can also search data in proprietary backup formats on network shares, backup servers, Virtual Tape Libraries, and tapes. Legacy backup tapes can be processed and the data indexed for future reference, without the need for a backup server or the original backup software application.
Once an index is created, it can be used to finally reduce the amount of data that’s been choking the primary storage infrastructure. Data owners can identify redundant or worthless data and have it eliminated, then classify the remaining data according to its business value. This allows IT to migrate it to the appropriate storage tier and enforce existing data storage policies. Finally, archives can be created and populated with that growing set of data that’s rarely accessed but must remain accessible - and get it out of the backup system. The result is a reduction in primary and backup storage volume and the costs associated with each.
Indexing Creates a Business Asset
In addition to helping solve the problem of data growth driving inefficient storage growth, content indexing can provide another benefit; it can turn what was formerly an expensive, long term storage of static data, into an asset. After an organization’s legacy data stores have been processed and indexed, they’re available for data mining and other research. In addition to eliminating the need to recreate data that can’t be found, historical information can be analyzed and used for strategic planning, tactical decision support or risk profiling.
With an accurate index, data owners can keep on top of their data sets and prevent the build up of duplicate and low value data on primary storage systems in the future. IT can look forward to adding storage in a more responsible fashion, with the planning and organization required. Compliance officers can be confident in their ability to find and produce data in response to legal and regulatory actions. Management can enjoy better decision support as they leverage historical data that’s now readily available. Finally, content indexing corporate data through the use of IT Discovery systems enables real primary data reduction and a reduction in hard and soft infrastructure costs that this brings.
Eric Slack, Senior Analyst
This Article Sponsored by Index Engines