The Evolution of Data Archiving
The Evolution of Data Archiving
The most common method of achieving this goal is through the process of archiving, which, unlike backups, moves inactive data from primary disk-based storage to an easily accessible, less expensive secondary storage tier then deleting it from the disk source locations. This can help realize the objective of cutting costs by freeing up expensive primary storage, reducing backup windows thereby increasing operational efficiency and providing reliable protection of the data for the long term. A viable and effective data archive should provide:
•Scalability
•Cost effectiveness
•Availability
•Secure long term protection for the data
In this article we take a closer look at the evolution of the archiving process and the tier it uses for storage with a focus on the differences and advantages of the available archiving methods. Those methods are:
•Legacy or traditional tape based archiving
•On-premise disk based archiving
•Cloud Storage archiving
Legacy Archiving
This type of archiving is the traditional tape based method where data is saved from disk to tape as part of a backup process using backup software or system utilities to write the data to tapes in stand-alone tape drives or automated tape libraries. These tapes and the data they contain are segregated from normal backups and assigned a lengthy retention period, usually from 10 years to infinite. These archive tapes are then sent to an off-site storage facility for permanent storage while the data backed up to them is deleted from the server’s disk drives.
The primary advantages to this method are that tapes are relatively inexpensive, easy to handle, last a long time and provide very scalable storage, which makes storing large quantities of data very cost effective. To increase your storage capacity you simply add more tapes. You could also provide additional data redundancy by creating copies or clones of the data sets and/or primary data tapes with certain types of backup applications.
On the downside though you would have to wait for tapes to be brought back from off-site storage before you could retrieve data from them and there would also be the time involved in scanning the tapes, locating the required data and restoring it using the application which created the tape. Additionally you have the constraints of explosive data growth resulting in ever shrinking backup windows, limited retention capabilities and no practical means to verify the integrity of the tape media and its data as they age in storage.
On-premise Disk Based Archiving
The last few years have seen significant growth in the amounts of data and in the increasing requirement in today’s business environment to store and access ever increasing amounts of archive data. Much of this is the result of new legal compliance requirements such as SOX, along with the globalization and decentralization of traditional company structures, which now have multiple offices scattered across large geographical areas instead having one or two centralized locations as in the past. The need to quickly and easily access large quantities of archive data for collaboration, research and other business processes can be best addressed by disk based archive systems.
In order to address these new requirements to access archived data more efficiently and store ever increasing amounts of new data, such as email, databases, etc., businesses began looking at other storage alternatives that would meet these needs in a cost effective manner.
The first step in the evolution and modernization of archiving was the deployment of disk-based solutions using less expensive standard off the shelf hardware and SATA drives as well as inexpensive NAS devices. This allowed companies to maintain their data archives on site where they could access them easily and quickly whenever necessary. However, these early implementations did not adequately address the unique requirements of an archive, such as the ability to scale to Petabytes to accommodate explosive data growth and lengthy retention periods, or provide means to protect and insure the integrity of this data beyond basic RAID 6 levels in order to meet legal and corporate governance requirements. They also lacked the means to manage the archive process automatically and to impose specific retention policies on the data.
This led to the next step, which was the introduction of specialized archive systems designed to manage the data archiving process. These systems provided fast, well managed storage that could scale in capacity easily and that came with the necessary tools and software to manage the archiving process. These systems also provided data protection features beyond basic RAID 6 as well as retention policy inclusion, data integrity verification and WORM (Write Once Read Many) capabilities.
The advantages to this method of archiving were that the data copies were stored on-site and could be accessed quickly and easily. There was no more waiting for tapes to be brought back from off-site facilities and there was no need for special hardware or backup software in order to restore the data from tapes. It was also very easy to index and search for specific data on disk. You could move data easily from one location to another on the network by simple copy commands and you could scale the capacity of these systems easily to accommodate data growth. The major benefit was in reducing the primary storage requirements, thus avoiding the need to purchase additional expensive primary storage for quite some time.
The disadvantages for business with these disk based archive systems however was the initial buy in cost for systems that frequently started out with approximately 50 TB of disk. For many businesses, it simply was neither practical nor cost effective to buy so much disk space up front when they might only need one or two Terabytes initially. In effect you would be paying up front for a lot of storage that you might not fully utilize for years. There are also the ongoing costs of powering, cooling, managing, maintaining and upgrading these systems along with their support infrastructure.
Cloud Storage Archiving
Faced with frozen or slashed budgets and minimal staffing levels but ever growing data storage demands, businesses began to look for other means to expand their storage capacity in the most cost effective manner possible. Consequently businesses are now looking at the latest development known as cloud computing, more specifically at cloud Storage-as-a-Service as a possible means to meet their growing storage needs while minimizing their costs and need for additional personnel, hardware, infrastructure, etc.
Service providers in this new area of cloud computing provide almost infinitely scalable storage to businesses as a service with a fixed cost based on usage. This allows businesses to expand their storage on an as-needed basis without having to concern themselves with the usual costs normally associated with expanding a disk environment such as building more infrastructure, hiring and training more personnel to manage the additional storage, increased cooling and power costs for the additional storage, etc. Among other advantages, this service model also provides a geographically aware infrastructure with multiple locations. This allows businesses with multiple locations in different geographical areas to access their data from any point where they have some type of Internet access. All of this access is provided transparently to the user and appears as a simple mount point within their LAN or WAN. These solutions usually integrate easily with a business’s existing infrastructure and applications while providing secure connections for all data transfers with the data secured in-flight and at rest.
These services also usually provide the ability to impose data retention policies on the stored data. Among the major providers in cloud storage archiving are Iron Mountain’s Virtual File Stolre (VFS), Nirvanix, Rackspace Hosting Inc.’s Mosso Cloud Division, Vaultscape and Amazon.com’s S3 service. Iron Mountain has the most experience in storing, protecting and archiving information.
Among the advantages offered by this type of service are the ability to scale your storage almost instantly without incurring large up front capital expenditures for neither hardware nor the need to expand your network infrastructure along with hiring and training more personnel to manage the additional storage. You also avoid downstream costs to upgrade and refresh the storage hardware as it ages.
The only disadvantage to this type of service is that the company’s data now resides on someone else’s systems rather than on the company’s local systems. So you want to carefully analyze what types of data would be archived to the cloud and also how the data is protected before it’s moved to the cloud. In this use case experience matters, having an organization that has a proven track record of archiving and storing data securely is a clear and distinct advantage.
Because of improved access and reliability the evolution of data archiving has allowed customers to be more aggressive with moving data from primary storage. Doing so not only reduces costs in primary storage it also reduces costs in data protection and disaster recovery. Cloud Storage Archive as we detail in our article "Can Cloud Storage be the Solution to the Data Explosion?" is the next step in the evolution of data archiving.
Monday, April 6, 2009
Key Takeaways
•Basic difference between backup and archiving
•Types of archiving with pros and cons
•Advantages of on-line vs. on-premise archiving
•How cloud archiving has potential to save more IT dollars
Related Articles
Backups: Band-aides or Solutions
Cloud Storage is About more than Price
Can Cloud Storage be the Solution to Data Explosion?
Improving the Backup Process with Cloud Archiving
The Importance of a Cloud Storage API
Joseph Ortiz, Senior Analyst, Storage Switzerland, LLC
Over the last few years there has been an increasing amount of interest in data archiving. Factors such as the explosive growth of data quantities on corporate networks along with the need to retain more and more of this data for longer periods in order to meet various legal and corporate governance requirements, and the need to reduce costs wherever possible, are driving this growing interest. Along with this growing interest we have seen The Evolution of Data Archiving that has now culminated in Cloud Based Archive Solutions.
As most of us know, approximately 20-30% of the data on most networks is active data while 70-80% of the data is static or inactive, which is unchanging and infrequently accessed. Keeping this inactive data on primary Tier 1 storage is expensive and inefficient. Nevertheless, it is often necessary and/or desirable to preserve this inactive data for future reference and to comply with various legal and governance requirements. It makes sense to store it on the least expensive media available while preserving the security of this data and providing access to it in a reasonable amount of time should it be required.