ILM as a Data Protection Aid

 

A case for ILM


I admit it, I like beating up on ILM (Information Life-cycle management). It is easy, but as I mentioned in my article about Building a Disk Based Archive, it does have its usefulness when properly deployed. Most of the hype around ILM centers on using it to maximize your investment in expensive high speed disk and while that is effective, it does not bring about the ROI that many are looking for when deploying ILM. The cost and simplicity of just buying more tier one disk is too tempting.


ILM as a data protection aid


The biggest challenge to data protection is the ever growing amount of capacity vs. the ever shrinking backup window. Most indications are that data continues to double year on year and there is a need to keep more of it online and available because of regulatory or legal requirements. Much of what ails backup is the constant towing of information across the network to the backup server. By the time 10GB Ethernet is fully implemented and put into use for a utility network, data will have doubled three or four more times. The backbone itself is not the only problem, the speed at which the client can put data on that backbone is a much bigger one.


Most Enterprise Applications have you install an agent on a server that needs to be backed up. The agent walks the file system and identifies files that need to be backed up. In the case of a full, that is every file on the system and with incremental it is every file that has changed since the last backup. Either way the file-system must be walked, then the data that is going to be backed up is typically packaged in your backup applications backup format so that work does not have to be done at the back up server. Then that package is sent across the network to the backup server. Notice how much of that work has to be done on the local server. ILM can help in all three areas. By making less data available there is less file-system space to walk, resulting in the creation of fewer and smaller packages thus making it easier to put that data on the network.


A point to be careful with here is how to move this data off the primary host. The Data Mover function of ILM often uses a stub or pointer files to map to where the original copy of the file is now stored. The Data Mover will then have an auto de-migration capability when those files are accessed. Any time a user or application requests to open one of those stub files, that application is either pointed to the new location or more commonly the file is recovered to the original location. If you use an application that leaves behind pointer or stub files, this does not dramatically help the file-system walk. One of the largest challenges in backup systems is dealing with servers with millions of small files. An ILM solution that leaves stub files still requires the backup application to walk and examine all of those stub files. Additionally, doing so requires either integration to the backup application or turning off the auto de-migration facility to make sure that those files are not recalled when the backup application does its file-system walk. Failure to do so could be very messy to say the least.


That being the case, I tend to recommend ILM data movers that do not require a stub file being left behind. This is typically going to be achieved by global file system implementations. Think of Global File Systems as a DNS Server for files. A global file-system sits between the user and the data; when a user requests a file it maps that user to that data. Similar to if I want to go to Yahoo.com. I don’t really want to know what the IP address is, I just type in yahoo.com and there it is. With files, I simply request george.doc and the global file-system handles which source to pull it from. Be careful, this is NOT Microsoft’s DFS; to make this part of an ILM or Archive strategy we need to have the Global Files System make decisions on the data based on rules that you pre-define. For example, any file on server "A" that has not been accessed in 90 days moves to server "B", which may be our disk based archive. Microsoft DFS will provide the DNS like functionality for files, but today it can not make decisions based on the attributes of that data. For example it can not move a specific file based on its age. There is a product recently purchased by Brocade that can provide some decision making functionality to DFS, but that is limited to folder level moves. Meaning that all the files in a folder would have to reach a certain age for the data to move to an archive. Look for specific multi-platform Global File-system products, typically appliances, that can interact with the discrete files and make decisions based on certain attributes like age.


Once a data mover like this is in place, your backup systems can reap the benefit. You can reduce the number of files that need to be walked, reduce the size of the actual data that needs to be backed up and lessen the amount of data that has to go across the wire. The result is faster more reliable backups. With a disk based archive in place like I outlined in my article last week, you can also be much more aggressive in terms of how quickly you archive data off of your production environment. In the “old” days of tape or even optical based archive you will typically only migrate after a year or so of inactivity. Now a 90 day rule is much more common. The disk based archive can serve the data to the user almost as fast as the traditional file server can. You can also now benefit from the other traditional ILM concept of improved utilization.


Not to be overlooked is the fact that in the event of a recovery much less data has to be restored. You have essentially created a working set of data that is the most active and at that point in time the most important. Look at ILM not only as a way to save on disk capacity purchases but as a method to improve the speed and reliability of your backup and recovery efforts.

 

Sunday, June 3, 2007

 
 
Made on a Mac

next >

< previous