Integrating Deduplication
Integrating Deduplication
One of the major benefits of data duplication is the reduction or elimination of tape as the primary mechanism for safeguarding data. By only sending deduplicated data segments over the wire to a secure location, IT users can forego the need to go to tape while very efficiently utilizing network resources to achieve greater DR reliability. To date, many enterprises have reaped the benefits of this strategy and have realized significant recurring hard dollar savings.
While deduplication systems are very promising solutions for helping enterprise customers deliver on mandated cost reductions, there are several different methodologies available for introducing deduplication into the backup process. Some approaches are very disruptive and can introduce significant management trade-offs which deplete the overall value proposition. Other forms of deduplication are much less intrusive, integrate well with legacy technologies and deliver a strong ROI.
In short, IT planners need to closely examine whether their current backup environment is capable of absorbing the higher effort associated with a complete replacement and if those solutions can support higher levels of DR automation to extend operational efficiencies across the board. If not then they should consider a solution that extends the current backup investment.
R.I.P. & Replace?
Some backup software vendors are offering deduplication as an integrated feature with their enterprise backup software offerings. One of the drawbacks with this approach is if you are not a current user of the vendor’s backup software, you need to replace your existing backup product and integrate the proposed solution. IT users are generally loathe to rip out their existing backup software since it requires a major investment in time implementing the product, re-training of IT staff to manage the solution and cost justifying the funds required to procure and integrate the software.
Furthermore by adopting an “all-in-one” solution, IT planners are placing a high stakes bet that the vendor’s single offering will deliver equivalent or “good enough” functionality for data protection, data reduction and DR automation. Closed-end architectures by their very nature are limiting and the least innovative. Moreover, historically, closed-end solutions have lost to technology offerings which provide open integration with best of breed solutions.
For example, many vendors were slow to market to support VMware. As a result, alternative point solutions emerged to meet demand. Many customers who selected these solutions decided that the necessity of properly protecting the new environment outweighed the downsides of managing another point product. Likewise, a decision to implement data deduplication should be made based on its full merits and capabilities and not merely as an add-on to an existing product set.
Another approach is adding client based deduplication software as a separate silo to coexist alongside the existing tape based backup environment. While slightly less disruptive to a rip and replace strategy, “source based” deduplication adds more complexity to the environment as it is another software tool to manage and yet another agent to manage on a host
Adding a backup-to-disk appliance is a final method for integrating deduplication into the backup environment. While this approach is the most straightforward and the least disruptive, it does require adding more hardware into the infrastructure and introduces another point of management.
To further complicate the equation, most deduplication vendors offer optional replication software, separate and out of the control path of the backup software, to perform electronic vaulting of backup data. One major challenge posed by assigning the backup replication process to the deduplication system is the backup software has no awareness of the replicated backup copy. IT administrators need to manually create an entry in the backup software catalogue in order to make the deduped replicated copy available for local or remote restores. In addition, this operation will have to be performed twice if there is an instance of the master catalogue in the primary and secondary data center locations. While this may not seem like a major administrative issue, it can quickly become very burdensome when there are dozens of daily replicated copies to manage. Furthermore, when a DR event does occur, the added complexity and time to perform manual operations will only be magnified during the actual event.
Clearly there are many things to carefully consider before embarking on a deduplication initiative. Adopting an approach that is the least disruptive and introduces minimal management complexity or overhead while delivering on key efficiencies is the order of the day. It is also important to have the ability to leverage best of breed technologies through a common management framework. In fact, there is enormous value in automating and simplifying the integration between backup software and data deduplication systems to manage a true end-to-end enterprise backup and electronic vaulting/replication process.
Is it possible to have the best of both worlds? Keep the existing investment in backup software yet extend its functionality with optional capabilities that offer seamless integration? One such example is deploying a Data Domain deduplication appliance in conjunction with Symantec’s OST technology.
This article will examine Symantec OST (Open Storage Technology).
OST provides a highly advanced integration layer between data deduplication platforms like those from Data Domain and Symantec NetBackup to provide significant enhancements to automating backup and DR processes.
Just as a peripheral software market was created to address some of the management shortcomings inherent in server virtualization, OST is helping to address the limitations of managing “virtualized” backup resources in the data center. To be clear, OST is a technology specifically written for Symantec backup software products; it cannot be utilized to manage non-Symantec solutions. To date however, there is no competitive equivalent to OST and it is in many respects a game changer for enhancing backup processes, especially when implemented into a deduplication framework.
There are multiple key functional elements to OST:
1.Media server “virtualization” and load balancing
2.Integrated awareness of deduplicated data systems or “intelligent storage” resources
3.Consolidated management of all backup and backup replication processes
4.Sharing of disk storage resources across all media servers
5.Backup throughput performance enhancements
Even in the absence of a deduplicated environment, OST delivers significant improvements in managing, sharing and allocating disk and tape resources within a centralized backup environment. For example, disk backup devices can be shared amongst multiple backup media servers for improved utilization and “load balancing” between media servers, ensuring quality of service and effective use of all available resources.
The benefits of OST are even further enhanced, however, when fully integrated with “intelligent storage” or deduplication disk systems. Through an open API, disk vendors can write a plug-in to OST and enable NetBackup to gain awareness of duplicate backup data images wherever they may reside. In addition, backup data replicated by intelligent storage devices can be managed and monitored directly from the NetBackup console--providing a consolidated view of all backup processes and enabling administrators to perform restores of all backup data, whether it is local or in a remote location.
As an example, Data Domain provides an OST plug-in as an optional software product sold with their deduplication appliance. When integrated with NetBackup, end users can manage as well as track all deduplicated backup images and their replication pairs directly from the NetBackup console.
Another big advantage with OST is a major boost in backup throughput. OST, in effect, behaves like a network protocol (like NFS/CIFS) but without all the overhead contained in the TCP stack. Data Domain/OST users generally see a 1.5 – 2X performance boost—dramatically reducing backup windows and the time required to complete offsite replication. In fact, Data Domain sees OST as an enabling technology for them to eventually provide in-line deduplication systems that are faster than VTL platforms.
Finally, when combining OST in a globally clustered master server environment, enterprises can enjoy the benefits of a fully synchronized catalogue database across multiple locations; delivering even higher levels of backup and DR automation and functionality.
As IT users plot out their strategies for reducing the costs and operational complexities of their backup and DR environments, it is important to consider technologies which deliver on high levels of automation, simplicity, performance and resource efficiency. Like server virtualization, data deduplication is helping IT enterprises reduce inefficiency and management overhead in the data center. These benefits can be even further enhanced when combined within an intelligent framework like OST.
Tuesday, June 16, 2009
As data deduplication becomes increasingly adopted by IT planners as a strategy for containing physical infrastructure footprint and the related capital and operating costs, it is important for decision makers not to overlook how to best integrate the technology into their environment to enhance overall DR capabilities. Without proper integration, the costs to manage the deduplication environment could outweigh its capital cost savings.