When it comes to disaster recovery a key cost reduction is server consolidation because you no longer need to have a one-to-one physical server to physical server relationship at the second site, the cost to develop a DR site can be greatly reduced. This only helps with the cost issues but the key challenge remains; getting the data to the remote site.


If you can afford it there is the traditional high-end method of using SAN based replication to get the data to the DR site and with the addition of Site Recovery Manager from VMware the workflow that surrounds site recovery can be somewhat automated.  While this method certainly works, for many organizations it is out of reach because it requires a redundant investment in nearly the exact same storage hardware at the remote location. This can be particularly frustrating as the storage in the DR site will seldom be used in a production environment. Ideally you would want the remote site to be reliable but more cost effective than the primary storage at the main data center.


One option to drive the cost of remotely replicating virtual environments is to install a replication software application inside the guest OS of each virtual machine. While many times this is less expensive than the SAN replication method, the number of clients that need to be installed and managed quickly becomes unwieldily. The sheer effort involved in checking to make sure that each replication job is responding properly can be very time consuming.


A potential option that may strike a better balance is using a host level replication tool like Vizioncore's vReplicator that actually leverages the fact that server virtualization does much of the work already. This is because the server virtualization software already encapsulates the 1,000's of files that make up a server into just a few files and the process to replicate those files can be much easier. Host level replication tools are aware of this, allow a far less extensive development requirement and as a result are typically significantly less expensive as well as simpler to use and maintain.


The cost savings come from three areas. First the cost of the host level replication software itself is significantly less expensive that SAN based replication modules or software based replication applications. Second these tools replicate across the standard IP network, eliminating the need for special hardware to convert from a fibre channel SAN to the WAN. Finally the storage at the remote site does not have to be the identical storage that is at the primary site, allowing you to select an alternate brand that may be less expensive and higher capacity yet still maintain the reliability you are looking for.


Something to be aware of with host level replication software is that it is not typically designed for realtime replication, although some solutions are beginning to perform continuous replication. Ideally these solutions should be thought of as something that will capture your environment every four to six hours and then push the changes to the remote site. At the remote site you have essentially a powered off virtual machine(VM) which saves on secondary licensing costs.


This powered off VM state is also important because no recovery step needs to happen.  Simply turn on the VM and it is online and ready and is roughly the equivalent of powering on a server. For example if you were using a backup application with replication capabilities, even one where deduplication is enabled, the data at the remote site would be stored in a backup format. Prior to any use the DR site would need to be recovered and that potentially means moving thousands of server images from backup storage to the active DR storage. The hours or days it may take to reposition these server images may be unacceptable to the organization.


Another key factor is to make sure the replication is being performed in a crash consistent state. It is important that these host level replication tools can interface with both the hypervisor's snapshot tools as well as the snapshot tools of the guest OS; Microsoft VSS is a good example. This insures that the data at the remote site is consistent with the primary site preventing time consuming database re-indexing when a failover occurs.


Testing is always a challenge when developing and maintaining a disaster recovery strategy. Virtualization again makes that easier and host level replication applications can tap into that ability. With these applications in place it is straight forward to stop replication, simulate a severed connection and remote start the VMs in the remote locations. Testing can all be done with a click of a button.


Testing of the application performance impact is potentially more important in a virtualized DR site than it is in a non-virtualized one. Often in a virtualized DR site there will be a significantly greater density of VMs per physical host. It is important to ascertain the impact on performance with that greater density.


Host level replication enables a DR strategy to have very few servers at the DR site, essentially they act as a holding area for the closed VM targets, purposely overloading the physical DR servers, again further keeping DR costs down. Then if the disaster occurs only start up the critical systems first, then additional supporting virtualization hosts can be ordered in, loaded with the hypervisor and brought online. With these hosts now online the additional VMs can be started up and leveraging live migration can be moved to these new hosts. The value of server virtualization is the closed VMs could be started and moved to the new systems and not impact overall performance.


In the case of an actual disaster often connectivity and systems at the primary site are still online and functional. Often people had to be moved out of harms way "just in case". In this situation, part of the failover process is the remote site can be promoted with replication jobs being sent back to the original primary site to keep that in sync and make the, often forgotten, return part of the disaster recovery process less painful. Even in the case where the primary site is offline for a few days, power outage for example, but the physical equipment survives, the software can be set up to send just the changes to the original primary in preparation for return.


As the virtual server environment grows it may make sense to enable SAN replication and tools like SRM to facilitate a lower data loss objective, almost real time as opposed to four hours. Even in this instance virtual replication should still be used, because ALL the servers don't need this high level of protection. By limiting the number of virtual machines that need to be replicated to like storage hardware in the DR site, smaller amounts of this expensive storage need to be purchased. The remaining systems can replicate via virtualized replication software to secondary less expensive storage.


As is always the case, DR costs can be kept under control and service levels maintained by using the right tool for the right job. Host level replication leverages the investment in virtualization to simplify and drive down the cost of establishing a virtualized DR site.