The problem is that IT still has a responsibility to make these data discoverable (eDiscovery), regardless of the state of its storage infrastructure. Unfortunately, when basic storage system architecture was created, eDiscovery wasn’t even a word. Storage systems were primarily built for throughput performance, storage density, capacity expansion and management of these processes, not for random access of specific data objects. They weren’t built to locate and access specific files from within the vast containers these objects are stored in.


Since the December 2006 change in the Federal Rules of Civil Procedure, electronic discoverability has been a legal requirement. But most IT infrastructures still can’t do it efficiently - and the problem is just getting worse.


More Data, Less Organization


Moore’s Law (as it’s applied to storage) assures that capacity will increase regularly as cost decreases. This drives the ‘buy more, don’t organize’ philosophy that’s so common. It’s largely an economic issue. It IS cheaper to buy more storage than to get the data owners and IT together to organize these data and manage the storage infrastructure. 


In addition to the legal discovery challenges already mentioned, this results in a number of other fundamental IT issues, a few of which are:


  1. -Redundant data as applications make temporary copies and users save the same files


  2. -Choked backups as multiple copies of data are saved and re-saved with time


  3. -Increased CapEx costs as technologies like deduplication are implemented to handle this

  4. inefficient backup process


- Increased OpEx costs as IT personnel manage this burgeoning infrastructure


There’s also the cost to the organization of regenerated data sets as business units re-create data they can’t find or access; costs in capacity, management, backup, etc. 


IT Discovery - not eDiscovery


Historically, the need to search and retrieve data in response to a legal action has been addressed by eDiscovery tools, applications built mainly for legal departments. These tools weren’t built to handle the scope of data sets typical in an IT environment and can’t interface with the applications or hardware that store that data - email and tape backup, for example. There’s a new concept called IT Discovery which refers to a category of searching and indexing tools that ARE built for IT. They have the architecture and raw horsepower for the amount of data IT encounters and knowledge of the applications that have access to that data. IT Discovery is a process that creates a content-searchable index of corporate electronic data and documents. 


Companies like Index Engines have designed these IT Discovery systems that can be implemented as an overlay application to search the corporate data systems, without being integrated into the infrastructure. They create the indexes that are used to make corporate data ‘litigation ready’ and support other regulatory compliance. But they also facilitate the data organization that can address the core IT issues listed above.


Ideally, IT Discovery is run as a proactive process to produce the indexes before they’re needed. Since a discovery motion can come up at any time, data needs to be in a litigation ready state, all the time. In order to be effective in the IT environment, IT Discovery must be able to:


  1. -Scale to the enterprise - handle the scope of data sets typical to the IT environment (billions of data objects) and expand to keep up IT storage growth (100’s of TBs)


  2. - Run fast enough to be non-disruptive - be architected to process enterprise IT data sets in a reasonable timeframe


  1. - Handle all file and backup formats - be compatible with IT file formats and work efficiently with backup applications


  1. - Streamline across containers - be able to access data in all common storage platforms - NAS, SAN, VTL, backup tapes, PCs, etc


- Provide cost effective results - be efficient so as to keep usage costs at reasonable levels


Other Benefits of IT Discovery

         

Creating the ability to find any piece of data in your infrastructure is a powerful thing. It certainly makes the stored data litigation-ready, but also provides the ability to address some fundamental IT data storage issues. With this comprehensive index, organizations can set up an archive with effective data retrieval. This can reduce the data stored in backups and on primary storage and reduce overall network traffic. Companies can eliminate the wasteful recreation of data that can’t be found, better support business processes and implement data mining. With this organization OpEx costs can also come down, as IT personnel spend less time managing data.


Summary


Corporations have been required to produce electronic documents on demand for legal compliance for almost three years. This obligation has been made more difficult by the growth of data and the resulting disorganization of that data. But IT is still ‘on the hook’. Unfortunately, the eDiscovery tools available to search and index IT infrastructures are inadequate given the volumes of data and the containers in which it’s stored. 


IT Discovery is a new category of products, such as Index Engines, that are designed to address this issue, with the capability to efficiently access current storage infrastructures and create content-searchable indices. When conducted proactively, IT Discovery can enable IT to meet data-specific storage/retention/recovery objectives, as well as support legal requirements when they come up. It can also provide other significant business benefits related to improved access to historical data sets.


IT Discovery is not eDiscovery, but a comprehensive process to find and organize the data corporations accumulate to facilitate better protection, better growth management and better compliance. IT Discovery can help organizations turn their vast amounts of stored data from a potential liability into a business asset.

Eric Slack, Senior Analyst