The Value of a Global File System
The Value of a Global File System
As mentioned in my article "A Case for ILM", to be effective a tiered storage solution needs a data mover. Global File Systems are an excellent choice in filling that role. Global File Systems are also called File Virtualization. These solutions can provide additional value beyond just moving data, like server migration, load balancing and, in some cases, replication. A Global File System is to data what a DNS Server is to IP addresses. When I go to a web site I don't need to, or want to, know the IP address. In the same vain most users don't care where there file is physically stored, but when they access "george.doc" they want the file. A Global File System accomplishes just that, but without leaving stub files like some ILM applications.
Stub files are a problem, no matter how you look at it. Stub files are left by traditional data movement applications to track the new location of a file. As I mentioned in the "A Case for ILM" article, stub files still slow down the backup process. They have to be tracked and even the process of ignoring them takes effort from the backup application. One of the biggest backup challenges is dealing with servers that have millions of files; leaving stub files does not minimize this issue, it perpetuates the problem.
In addition, there are issues with the second move of a file. For example, with traditional ILM applications when you move a file from Tier one storage, to Tier two storage, you leave a stub file in the tier one file system, yet the actual file is now on the tier two storage. When you want to migrate the file from Tier two storage to slower (and cheaper) Tier three storage, this becomes very complex, because it would require another stub being left in tier two and updating the stub file on the tier one system. Some of these traditional data movers can not even support this multi-hop move and there have been countless cases of users deleting the stub file. This then strands the actual file and, in some cases, no utilities are available to find the actual file.
Global Filesystems do not leave stubs, so they are less complex and there is less risk associated with their use. Typically they come in two flavors; they are at an OS level via software and they are offered via an appliance. RedHat's GFS or Microsoft DFS with Brocade's File Data Services are good examples of a OS based solution. Acopia is an excellent example of an appliance or switch based implementation. With either solution, all file request traffic is directed to an intelligent device that contains meta-data about where the file actually is. The actual file location is provided and traffic is re-routed to the physical file location.
The challenge with the software based implementations is that they are weak when it comes to file migrations. For example to move a file based on age they typically need to have an entire folder or directory meet the age policy to perform a move. They are not granular to a file level. They also tend to be very OS specific and if you have multiple OS's in your environment you may end up with multiple GFS implementations. Their primary advantage is that they are "semi" out-of-band. By that I mean that they only are in the data path for the meta data look up. File delivery is typically done by the file server directly to the requesting user. With this architecture, the requirements placed on the meta data server are somewhat modest.
Appliance based systems, like Acopia, look almost more like switches than they do file servers. All data access, both meta data look ups and actual data delivery, flows through them. These in-band units need, if not require, switch like performance. By being in band it gives them much further granularity, allowing them to operate at the file level instead of at only the folder level. Despite the supposed negative of being in-band, these appliance based systems are very adapt at delivering a tiered storage solution.
A Global File System can be a critical driver for managing data and taking advantage of Tiered Storage or the Disk Based Archive that I outlined in my Disk Based Archive article. They can also provide key capabilities like seamless migration to new file servers or NAS devices, load balancing between NAS and file servers, plus replication of data to create an entirely redundant server environment.
Tuesday, June 19, 2007