Improving the Backup Process with Cloud Archiving
Improving the Backup Process with Cloud Archiving
The primary traditional role of tape has been to make backup copies of an organization’s data so that in the event of some catastrophic event or Disaster Recovery (DR) type emergency, the company could retrieve the data critical to its ability to continue to do business. Almost every large and mid-sized company or business today has at least one dedicated backup server and tape unit or library of some type. The larger companies usually have extensive backup infrastructures consisting of multiple backup servers and tape libraries to help protect the ever increasing amounts of data on their networks.
Even when companies began to deploy disk to handle backups, it was frequently for speeding up the backup process in order to deal with shrinking backup windows caused by explosive data growth. However, the data saved to the disk cache was still copied or migrated to tape by the backup application, once the disk backup process completed.
Moreover, until recently, a secondary role for tape was the long-term storage or archiving of static data stores. In most cases, this was for historical purposes as well as protecting against possible future litigation needs. It also allowed IT to free up valuable primary storage and reduce the backup window by removing large quantities of static data from the backup path.
Tape was a good choice for these tasks at the time because it was inexpensive, compared to disk, and easy to handle. It was also a well established, proven technology as it still is today.
However, there were some challenges in using tape for archiving through the backup application. In many cases, the archiving function simply consisted of segregating different types of data to tapes with moderate to long retention periods compared with the normal retention periods used for active data sets. Current active data backups were primarily for DR purposes while the inactive data was stratified into inactive data that was accessed from time to time for research and reference purposes while the rest consisted of inactive data that was accessed rarely or not at all. The backup applications were also limited in the granularity of information they could provide on the data stored in the archive. To search for and retrieve specific data sets, you usually needed to know the timeframe in which the required data existed as well as the name of the system that contained the data during the specified time frame. You generally needed to use special third party applications to gain greater ease and granularity in tracking and searching for specific data in tape archives.
However, significant changes in business models and requirements made companies look at their data in new ways that changed their requirements for storing and accessing their data. It was no longer sufficient to protect the data against some potential future DR need alone.
Among some of the changes affecting company’s storage requirements were:
•Decentralization of companies across large geographical areas
•The rise of collaboration on documents, databases, spread sheets, etc., among decentralized offices
•The need to access various types of historical static data for research, reference, data mining, etc. from decentralized offices
•New legal and government compliance requirements such as SOX, HIPAA, etc. with increasing retention requirements with decreasing time frames to comply with data requests from courts or government entities
•The need to locate quickly, specific data in response to e-Discovery requests from legal staff, courts and government entities and to protect that data from deletion or modification
•The need to protect data generated at remote offices in the most cost effective manner possible
•The need to protect ever increasing amounts of data while containing storage and infrastructure costs
Early methods of addressing these new data protection and storage needs were on-premise solutions utilizing various technologies such as inexpensive disk arrays, NAS appliances, replication technology, de-duplication appliances, additional tape servers and tape libraries, etc. However, these various solutions usually required significant up front investments in infrastructure, new hardware and trained personnel along with associated costs for power, cooling, rack space, floor space, cabling, etc. As the economic environment deteriorated and IT budgets became more and more constrained, it became necessary to find more cost effective methods of storing and protecting ever increasing amounts of data while retaining a very high degree of rapid data accessibility.
So how do you store and protect ever increasing amounts of data and make it readily accessible across large geographical areas while containing your costs? Many companies are now taking a hard look at cloud storage services as a potential solution to this problem.
Ideally, such a service would provide features such as:
•Ability to quickly and easily extend your storage capacity without incurring capital costs
•Solution integrates seamlessly with your existing infrastructure
•Ability to set retention policies on specific data sets and/or directories
•Ability to access your data from any geographic location
•Data is protected by multiple copies stored at different secure sites
•Data transfers are secured in transit and at rest
•Provides fully automated backup and recovery services for remote offices with little or no IT involvement
•Tracking and audit features that provide chain of custody information
•Data stored in a manner that complies with legal and government requirements for data integrity and auditability
•Ability to quickly and easily retrieve data
•Ability to easily and quickly perform e-Discovery searches for specific types of data
•Ability to extend retention periods for data flagged by e-Discovery searches
•Means to handle large scale data movements
•The provider is a solid company with a track record of providing IT service
In looking at the various companies now providing cloud storage, backup and archiving services, it makes sense to look for one that has a comprehensive set of services to complement the backup process in a one-stop manner. This is because backup and archive are complementary processes. With a reliable archive, the size and scope of the backup data set is greatly reduced. This actually broadens the case for using an online backup service since less data needs to be managed by that service.
Traditionally online backup has been considered for protection of remote branch offices and for small businesses that may not have the technical expertise and resources to maintain a traditional tape backup environment. Although those of course are still viable, archiving also enables larger companies to consider online backup services
Even without archiving, larger organizations should consider online backup as it might prove more cost effective for protecting data at remote offices than having to manage and maintain remote backup servers or transferring their data back to the main data center where it can be backed up. These services provide features such as scheduled automated backups (these are continuous backups), centralized management accessible over the Internet, cluster and VMware support, and rapid recoveries using unique block level restore capabilities.
With aggressive archiving, larger organizations can now consider bringing the above advantages of online backup into the data center as well, where they can leverage a cloud archive to migrate data thus reducing the size of the data protection data payload.
Deploying the Cloud Archive at an organization’s main offices provides companies with the ability to rapidly and transparently expand their data storage infrastructure without incurring capital costs for new hardware, personnel or infrastructure. It also allows them to properly protect ever increasing amounts of data that need to be shared across large geographical areas while preventing loss or unauthorized access of that data.
For example, Iron Mountain’s VFS service lets you define retention policies while providing tracking and auditing capabilities that provide full chain of custody information as well as protecting data from accidental or deliberate modification or deletion. The data sets are replicated to different geographical locations, which are completely secured and equipped with their own emergency power generators capable of handling an extended power failure situation.
Tape will and should continue to be counted on by organizations. Tape provides that last line of defense that some organizations will want to maintain. Finding an organization that can blend both the management and secure storage of traditional tape based backups and archives as well as the management of digital backups and archives is an ideal data assets management strategy.
In the final analysis, organizations will still want to keep on using tape for some backups and even data archiving as a complement to Cloud based archive and backup. Tape provides that last line of defense against the possibility of some major problem with the cloud service or Internet access or anything else that may keep a company from being able to access its data on-line. Cloud or online services improve the backup and archive process extending the user’s ability to access their data in new and more cost effective ways.
Monday, April 27, 2009
Joseph Ortiz, Senior Analyst
With all of the attention and discussion that is focused on disk-based backups and archives lately, one might wonder what ever happened to tape and where does it fit in the new evolving order of things. The answer is that tape is still alive and well as it continues to fulfill an important, traditional role in the data backup strategies of most businesses. In this article we examine the idea of improving the backup process with cloud archiving, which is the latest evolution in data storage services such as those provided by the Iron Mountain LiveVault and VFS services. We will also see how tape still has an important role to play in helping to manage an organization’s data assets.