Think of file virtualization as a technology similar to Dynamic Name Service (DNS) which removes the requirement to know the IP address of a web site; you simply type in the domain name. In a similar fashion, file virtualization eliminates the need to know exactly where a file is physically located. Users simply look to a single mount point that shows all their files.


The value of this abstraction from the user or the application is that the storage manager is now free to move that data around as needed. For example, if a new file server or NAS is purchased the data can be migrated over from the old storage system without having to make any changes to users’ logins. Some file virtualization solutions can also migrate that data in the background so that a new server can be brought up during the middle of the day without any disruption to users or their applications.


This flexibility also leads to (finally!) being able to implement a tiered storage or information lifecycle management (ILM) strategy. Data can be moved from a tier-one NAS to a tier-two or tier-three NAS or network file server automatically. In a common scenario, the primary NAS could be a high-speed high performance system storing production data. As these files are less frequently accessed they can be moved to lower cost, more power-efficient systems that bring down capital and operational costs.


File virtualization resolves another major challenge with tiered storage: how the users access older data. With traditional migration software the user either has to manually path to the file or the software has to create a small stub file that links to the file’s new physical location. The problem with this manual method is the need to train the users how to find the alternate file location, especially when there are several possible locations in which to look. The challenge with the stub file method is that these stubs at some point get deleted, possibly corrupted or mismanaged, causing files to be inaccessible. Additionally, they create a burden on the storage system since they are still files that represent overhead for the file system and the backup application.


File virtualization removes all that complexity by managing the metadata internally and not requiring stub files. This does mean, however, that the solutions require a lookup process on every file access. However, today’s file virtualization systems can be built to provide enough performance to cause little if any impact on overall access times.


There are two types of file virtualization methods available today. The first is one that’s built into the storage system or the NAS itself, often called a “global” file system. There may be some advantage here as the file system itself is managing the metadata that contains the file locations, which could solve the file stub and manual look-up problems mentioned earlier. The challenge however, is that this metadata is unique to the users of that specific file system, meaning the same hardware must be used to expand these systems and is often available from only one manufacturer. This eliminates one of the key capabilities of a file virtualization system, the flexibility to mix hardware in a system. With this solution, you could lose the ability to store older data on a less expensive storage system, a situation equivalent to not being able to mix brands of servers in a server virtualization project.


Often global file systems are not granular to the file level. They require that the entire folder meet the policy setting before any migration activity can take place. While this is fine in some environments most will want the flexibility of file-level granularity.


The second file virtualization method uses a stand alone virtualization engine - typically an appliance. These systems can either sit in the data path (in-band) or outside the data path (out-of-band) and can move files to alternate locations based on a variety of attribute related policies. In-band solutions typically offer better performance, while out-of-band solutions are simpler to implement. Both offer file-level granularity, but most importantly, stand alone file virtualization systems do not have to use all the same hardware. High performance NAS systems from one vendor can be used for the active, production data and high-value systems with power management capabilities from another vendor can be used for less active data. This allows the IT staff greater flexibility in providing both performance and cost savings to the organization they support.


File virtualization solves many challenges that IT administrators are dealing with. Policies can be set to manage data based on age, type and level of legal review required, for example. It makes logical sense that as applications become abstracted from the servers they run on, so should users be abstracted from the location of their data.

George Crump, Senior Analyst