Since the early days of mainframe computing, data archiving has been employed as a key function for migrating infrequently accessed data from high value, expensive DASD (storage resources) to lower cost tape resources. What was true for mainframe environments 30 years ago is also true today—whether it be mainframe, midrange or open systems. Namely that approximately 90% of any organization’s business data is static or very infrequently accessed while 10% of the data is being used constantly.


Without a viable strategy for moving data across storage tiers, IT business planners will be faced with not only escalating storage costs but also higher energy bills, increased data center footprint/real estate costs and higher operational/people overhead expenses. These issues not only present a direct threat to an organization’s fiscal bottom line but also to their competitiveness in an increasingly cut-throat global economy. The challenge to IT planners and storage architects is how to efficiently manage costly disk resources without sacrificing data availability and manageability.


While mainframe environments have enjoyed sophisticated and advanced levels of hierarchical storage management tools and logic for decades, open system platforms have only recently, within the last ten years, come into their own with similar tools and products. Many of the open systems software tools, however, come at a big premium to end users.


Generally speaking, typically only larger enterprises or mid-sized companies with specific retention requirements can afford or cost justify the archiving systems required to enable them to segregate infrequently accessed data sets from the repeatedly accessed files.


As mentioned earlier, this poses a dilemma for the small to medium sized enterprise (SME) business or data center. While SMEs have generally been described as “small” shops, even these environments are now beginning to scale from TB’s to ten’s of TB’s of data in production. In short, the SMEs now have to deal with many of the same big data center issues without the corresponding budgets to solve these problems. This article will examine a concept called “Archiving In Place” which some SMEs may find a practical way to maintain access to aging data without having to incur the costs associated with a traditional archive strategy yet at the same time avoid the costs of doing nothing (power, cooling, etc).


With the advent of ATA and SATA disk drive technology earlier this decade, data center environments, large and small, could cost effectively leverage spinning media as a faster nearline alternative to tape. Major storage players and new disk suppliers alike began aggressively marketing SATA technology and were fairly effective in eroding a portion of the tape market. Once initial issues around MTBF were addressed through RAID-6 or dual parity disk drive striping, storage managers could confidently implement SATA as a safe target to efficiently store and protect infrequently used data sets as part of a software driven HSM Tier-2/Tier-3 model.


Additionally as SAS based drives become more prevalent, storage managers now have the ability to add very reliable and high performance drive technology to compliment the SATA based systems, yet still realize much of the costs savings.


Advanced storage arrays could further integrate lower cost SATA storage by enabling IT users to employ snapshot software to maintain multiple versions of copies of data on lower cost disk. However, while SATA technology has helped bring down the cost of disk storage, it obviously cannot by itself do anything to mitigate the issues around the 90/10 rule of how data is accessed. Even storing 90% of infrequently accessed data on a midrange storage array utilizing all SATA disk can be a very expensive proposition--particularly when environments start scaling into the hundreds of TB’s.


In order to further leverage the economies of scale provided by SAS/SATA storage, IT users began adopting SAS/SATA only storage solutions from new suppliers. These offerings typically provided a very low cost of entry when compared to traditional or big name storage vendors. At the outset, these offerings could be purchased with a single controller and later be upgraded to include dual controllers, allowing cost conscious customers a very affordable entry into SAS/SATA technology. The tradeoff in adopting these early SATA offerings from new suppliers was a near total lack of management software available from the vendor itself. As a result feature/functionality was sacrificed for good, very low cost general-purpose storage.


Fast forward to present day and now some SAS/SATA only storage suppliers are providing very advanced levels of storage management software technology at a significant discount from name brand offerings. For example, Nexsan’s DATABeast product offers features like thin provisioning, virtualization, snapshotting, mirroring and replication.


For years, these types of feature sets were only available on upper mid-range to enterprise class storage systems. Now SME users can viably deploy a SATA based storage array to handle both production (Tier-2) online data and nearline (Tier-3) data simultaneously.


In addition to advanced software feature functionality, the DATABeast has the native ability to spin down inactive disks (MAID) to save on power consumption using either SATA or SAS technologies. This presents a very interesting opportunity for those SME customers that have a strong need for an effective archiving or data reduction strategy which incorporates advanced power saving technology to drive down energy costs.



Digging Through the Archives


End users are scouring the vendor landscape to determine which technologies will help drive down physical infrastructure requirements, lower management overhead and reduce power consumption. Server virtualization, data deduplication and file system archiving are all valuable approaches to helping data center managers meet business mandates to enhance service levels while reducing capital and operational outlays. Of the above three cost reduction methods, however, archiving is perhaps the most difficult to quantify from a savings perspective. Server virtualization and data deduplication cost savings are relatively black and white. Through virtualization, an IT manager can show hard dollar savings by reducing the need to procure and maintain physical servers. Likewise with deduplication, an IT manager can build a compelling case around tape elimination and all the associated hard dollar capital and operational cost savings.


Unlike deduplication and server virtualization, archiving on the other hand, is not quite so easy to justify. Unless an organization is compelled to perform archiving to satisfy a regulatory agency mandate, it is difficult to rationalize on the basis of reduced costs or greater efficiencies. This is particularly true in an economy when the bottom line is heavily scrutinized. Certainly SME customers fall into this camp.



The Practice of Archiving in Place


IT users can generally justify acquiring more storage when the current array is at or near full capacity. After all, business management understands the concept of “We’re down! No more storage…”.


Archiving In Place basically means that rather than proposing a costly archiving application or file system archiving appliance to perform the operation of de-staging inactive data to a lower cost storage medium, SME users can simply procure another low cost SATA array and manually migrate the active datasets from the old array over to the new array.


The first advantage to this approach is you’ll only need to migrate a small percentage of the overall dataset to the new array and as a result, will only need to purchase a much smaller complement of disk. Furthermore, moving the active 10% may make more sense than moving the inactive 90%. As previously discussed, the entry point for a new SAS/SATA array is pennies on the dollar when compared to traditional storage systems so from a budgetary outlay perspective, it should be much more palatable to the financial decision makers.


Additionally if the new system is SAS based an upgrade from SATA should lead to a performance increase. Since the investment in the next array upgrade will only be for managing the active 10% dataset, less storage needs to be procured and the investment can be focused instead on performance rather than high capacity spindles--for example 15K RPM drives.


The second advantage to this strategy is that the inactive datasets can be much more efficiently managed from a power consumption standpoint. For example, the AutoMAID feature on the DATABeast automatically spins down the disk based on access times and will progressively reduce power consumption the longer the datasets remain inactive. Also research indicates that power managed drives also experience a significant improvement in life expectancy.


Finally with Archiving In Place, storage managers will not have to spend the time integrating an archiving tool and learning how to use it.


From an end user’s perspective, using the archived storage could be as simple as pointing to a “/archive” folder for retrieving inactive data sets. Initial access times will be slower than the online array because the drives will need to come out of power savings mode and spin up, but this is measured in seconds not minutes. Subsequent access times following the initial retrieval will be at the same speed as the primary array.


Archiving In Place is a simple, cost effective strategy that enables IT decision makers in the SME community to leverage low cost, highly functional storage to manage online and nearline data cohesively with all the bells and whistles of name brand storage, without introducing additional management complexity and costs. Furthermore, when adopting affordable storage technologies that provide an attractive entry point into green data center computing, SME planners can also begin to realize the associated savings in power and cooling.

An Alternative Approach to Right Sizing Data Storage