For the Small to Medium-sized Company Backup is All About Recovery TIME
For the Small to Medium-sized Company Backup is All About Recovery TIME
Recent surveys show that most small to medium-sized company’s data is being backed up. But while some are testing their ability to restore data from those backups, many of these companies are unaware of how long the recovery process actually takes. The old adage that recovery is what’s important in the data protection process is only partially true. The amount of time it takes to bring an application back into production can now significantly impact a small or mid-sized company’s users and customers as well. Backup is not about backup, it’s not about recovery, it is about time to recover.
Restoration Requires Backup
Restoration is more than just copying data back from a disk backup appliance, it’s the time required to bring all their users or customers back online. All restoration must start with a perfect and secure backup, which means a backup that’s sent to a location that can survive a disaster, not just stored locally. This leads many organizations to consider cloud-based backup solutions since these by their nature move data off-site automatically.
What Can Go Wrong
Of course the reason that IT administrators are tasked with backing data up is that someday something might go wrong. These things can range from accidental user deletion of key data, to application corruption, to server hardware failure, to server hard disk failure or even a total site loss. While the loss of an entire facility captures the headlines more common sources of failures are user errors (accidental deletion) or an application corrupting its own data. Hard drive and server failures, while more rare, are so severe that specific protection must be provided against their occurrence, regardless of how improbable that may be.
Each of these server-specific failures are disasters in their own right and the time to recover from them becomes more critical as the business grows. And, each directly impacts user productivity and potentially, the customer experience. In most cases they also impact the company’s ability to generate revenue.
While a site disaster is the most severe, mid-market companies will often find a more patient user and customer community as they work to recover from these more dramatic disasters. As long as the data is still accessible and can be recovered in a few days most businesses will survive a total site disaster. Again a cloud-based copy of data is ideal for this.
The Recovery Challenge
There are three key phases to bringing an application back online in the case of one of the server specific disasters. The first is the time it takes to replace the failed component if there was a server or hard drive failure. This means obtaining the replacement, connecting it to the network and more than likely loading the core operating system onto the new server / hard drive. These steps can take days in some cases and can mean a significant loss in productivity.
The second phase is the time it takes to copy the data from the backup target to the primary hard drive. If the company chose to use a cloud-only backup solution this can also take days as data is transported across the internet to the replacement server. While some cloud backup providers can copy data to a hard drive and ship it to the customer, this process can still take several days to receive the hard drive and copy the data to the new server.
As a result a business with servers to protect needs to consider a cloud solution where at a minimum the most recent copy of data is stored locally. This is typically an appliance that acts as the backup disk target and a gateway to the cloud. However, the cloud appliance approach still has the challenge of copying data back to the new server, which, depending on the size of the data set, can still take hours.
The copy process across the network consumes more than just the time it takes to move a large data set across the network. There is also the time it takes to write data to the local hard drive or the SAN/NAS attached to the server.
The final phase is to actually bring the application back online. This can be complex depending on the state of data when it was backed up. If the application was not shut down correctly, recovery may take hours. There may also be steps required to reconfigure the application to work on the new server or correctly access the new storage device.
The net result is that recovery takes time, often much longer than expected. It’s not uncommon to hear IT personnel complain that a recovery of the failed application server took days longer than they expected.
The Virtual Recovery Solution
The solution to the recovery challenge is to always be in a “instant recovery” state by leveraging more powerful local appliances that can actually run virtual versions of a server when the original server fails. Products like QuorumLabs onQ Site Recovery Appliance allow the small to medium-sized company to bring an application back online in a matter of minutes by leveraging on-site appliances and virtualization.
The recovery solution starts by performing physical to virtual (P2V) backups of servers to a data protection appliance. Since the appliance has a hypervisor built into it, those backups, which are stored as virtual images, can be run in a virtual machine directly from the backup appliance. The application can run in this state for days or weeks if necessary until the failed server hardware can be replaced.
Initially, no data needs to be transferred across the network, nor do new servers or hard disks need to be purchased under duress. This avoids all the problems that can arise finding and configuring new hardware, as well as the time involved in copying data across the network. All that’s required is to start a virtual machine and users can begin running their applications again. For recoveries less severe than a server or hard drive failure, the backup images can be read and individual files pulled out for restoration.
Easy Testing
The biggest cause of problems in the recovery process is not testing that process before a real failure occurs. As stated earlier this has to be more than just occasionally copying a few files back and forth, it should mean bringing the whole application online. But again that’s expensive and time consuming. The value in a virtual recovery solution is that complete application recovery can be tested without the need to purchase standby hardware, software or to copy data. Testing can become a regular event that takes only a few minutes. Also, regular testing improves the skill level of IT personnel, reducing response time in an actual recovery scenario.
Most small to medium-sized businesses tend to focus on the backup task and assume the data protection job is done when their backups are successfully copied somewhere else. That “somewhere else” used to be a tape drive. Today it’s commonly a disk backup appliance and is more often becoming some form of cloud backup. A larger number of mid-sized companies are understanding that recovery of that data and the associated applications are just as important and are testing occasionally their ability to restore data. Very few though, appreciate the value of time to recovery. Virtual recovery solutions like QuorumLabs onQ appliances can help solve the time-to-recover problem that most small to medium-sized businesses are ill-prepared to face.
“QuorumLabs” is a registered trademark and “onQ” is a trademark of QuorumLabs
QuorumLabs is a client of Storage Switzerland
Wednesday, September 28, 2011
George Crump, Senior Analyst