Metadata is described as ‘data about data’, or the information generated and kept by a storage system that’s used to organize, search, identify and perform higher level management functions on that data. Metadata includes file attributes, permissions, access history, modification information, file system structures, indexes, etc. Uses of metadata in file operations are called metadata operations and include file system searches and policy functions for tiering and security, like comparing access histories to build a move list or verifying permissions.


Data management/protection features, such as replication, backup and snapshots also include a lot of metadata operations. One example would be the index searches for changed blocks that occur as part of the replication process.  Another would be the way a backup application walks the file system looking for files that have changed since the last backup.


The point is that a lot goes on ‘under the covers’ with respect to handling all the metadata that’s associated with traditional file operations. In fact, the number of metadata operations typically exceeds ‘regular’ data operations, often by many times, and all this overhead can have a significant impact on file system I/O. Metadata comprises a small percentage of the total file system capacity - typically in the low single digits - and most file systems store metadata together with regular data, on the same storage devices. File metadata would seem to be a good candidate for persistent tiering, or locking it into faster storage, but this would require a file system that can support two tiers of storage.


BlueArc’s SiliconFS™ supports a two-tiered storage architecture, and can differentiate metadata from regular file data and automatically put it on the faster storage. This enables the file system to speed up metadata operations but not have to use this more expensive storage to hold large portions of the file system as well. The fact that metadata comprises a small percentage of the total file system capacity, makes it a natural fit for this type of tiering. BlueArc’s Mercury and Titan network storage systems now support multiple combinations of storage, including SSD and SAS disks, SAS and SATA disks and SSD and SATA disks for this purpose.



The cost of disk performance


NAS is becoming the storage platform of choice in many virtual server environments as well as demanding verticals, like Media and Entertainment, eDiscovery and Life Sciences. In these environments, there’s a ‘standing order’ for more storage performance, which has traditionally involved expensive and often inefficient alternatives. Wide striping, or writing LUNs across more physical spindles, can increase performance but can also lower utilization and increase complexity. The practice of writing data to only the fastest, outermost cylinders of each disk drive, called “short stroking”, can also improve raw disk array performance, but at an even higher cost in spindle count.



Spindle efficiency


One of the benefits of an approach like metadata optimization is that it’s an ‘organic’ improvement in storage process efficiency, making its benefits independent of other methods, which can still increase performance as well. But, as a file system process, metadata tiering can provide these benefits without adding cost to the system. When compared with the per-spindle costs associated with traditional high performance data layouts like wide striping and short stroking, gaining the spindle efficiency through metadata optimization would be the most logical first step to improving performance - and perhaps the most cost-effective.



SSD efficiency


Solid state storage is another technology which can improve the performance of an existing storage array. But at a per-GB cost many times that of spinning disk, it must be applied judiciously. The challenge is to get the most active data onto the fastest storage without letting the capacity of SSD required ruin the economics of the solution. Metadata optimization with tiered file system can get the maximum impact out of the minimum SSD investment by using it for metadata exclusively. And, the fact that it’s part of the file system, and not a ‘for cost’ option, can further improve the economics.


In performance testing, BlueArc’s Metadata Optimization has generated some impressive numbers. For example, in a VMware environment, running a single file system on fibre channel arrays with SSD and SAS, the IOPS and latency numbers using Metadata Optimization was about 300% better. The results are below:

Briefing Report

Eric Slack, Senior Analyst

BlueArc is a client of Storage Switzerland

Multiple VM IOMeter Testing

The tests were run using a BlueArc Mercury 100 Network Storage System connected to 24 x 1TB SAS drives in BlueArc’s RC12 fibre channel controller configured in RAID6 (4+2) and 6 x 200GB SSDs in an HDS AMS2500 fibre channel controller configured in RAID5 (5+1). There were 35 Windows 2008 VMs in three ESX 4.1 hosts with 10GbE connectivity running IOMeter against the Mercury 100. Each VM ran 1 and 2 threads of 4KB 100% random I/O in a 75 read / 25 write mix. The Mercury had a single file system, single export. The “With Metadata Optimization” tests were run using SSD for metadata optimization in a tiered file system with the SAS drives. The other test was run using only SAS drives in a non-tiered file system.



Storage Switzerland’s Take


Focusing on metadata as a strategy to improve file system I/O performance makes so much sense that one wonders why it hasn’t been done more often. Like the answer a gangster was purported to have given a reporter when asked why he robbed banks - “That’s where the money is”. In file systems, metadata is where the activity is. Metadata represents the best opportunity to speed up some of the steps involved in file operations and improve performance across the board. BlueArc’s SiliconFS with metadata optimization represents a finite process for leveraging tiered storage, by locking metadata into the high speed tier. Other strategies involve continual decisions identifying candidate data and seemingly endless file movement in an effort to put the right data onto the fastest tier, at the right time. They can also mean significantly more high speed storage is required to achieve similar results, increasing costs. Metadata optimization provides improved file system performance that’s transparent to the user and doesn’t cost anything extra. Compared with complex tiering schemes, expensive SSD implementations and inefficient hard disk drive configurations (wide striping and short stroking), this seems like a ‘no-brainer’ decision.