Metadata Management
Metadata Management
The attraction of Solid State Disk (SSD) and high performance serial attached SCSI (SAS) has lead IT Managers on a quest to find out the best way to leverage these technologies in a system to improve storage performance. This quest has evolved from blindly adding faster storage media (SSD or SAS) to the more sophisticated automated tiering technologies that move data between mechanical disk and solid state storage. Unfortunately, as often happens, this rush to fix the problem has overlooked an obvious cause of storage I/O performance issues, how the file system deals with metadata, or the data about the data. While challenging for the storage system manufacturer, improving storage I/O performance through metadata management can have a significant pay-off in faster storage performance for the users of those systems.
Thursday, August 12, 2010
The logical first step in improving storage efficiency and I/O performance is to make the underlying file system itself faster and more responsive. Instead of throwing more high-performance disks or other hardware at the problem, forcing file system suppliers to improve their file systems may pay greater gains. Making the access and updating of metadata more efficient and using the right mix of high performance and low cost (near line SAS or SATA) disks is a logical next step in making those improvements. It also may lead to increased performance or reduced spindle cost with no reduction in I/O performance, and no investment in additional expensive storage devices.
Metadata is the most universally accessed data type and the speed at which this data can be read impacts the broadest set of applications. Every time a file is opened, saved, closed, searched, backed up or replicated some portion of metadata is updated. As a result CPU operations involving metadata happen every time a file is opened, processed or handled in any way. Some storage processes include many of these metadata operations for every ‘regular’ data operation. So it follows that improving the efficiency of how a file system handles metadata can directly affect its overall performance across a wide variety of applications. The other value to better management and placement of metadata information is that it can provide this performance boost at little to no additional system cost. These cost savings can be seen in reduced spindle count or elimination of the need to move prematurely to solid state storage.
When it comes to NAS architectures storage managers are trying to maintain or better reduce costs while at the same time trying to keep up with an ever growing demand for greater performance. Without the option of improving metadata efficiency to address storage I/O performance problems, storage managers today are limited to looking at solid state storage or high performance (15K RPM) SAS hard drives for an answer to the performance problem. Those options certainly will not help contain costs.
As stated earlier, metadata is information about ‘regular’ data that’s used to organize, search and identify that data, things like file attributes, permissions, access histories, etc. Metadata operations include tasks such as file system scanning, searches for particular content, and policy functions and most individual file operations include a number of metadata operations as well, often more. As an example, several metadata operations occur every time a file is considered for movement in the automated tiering process discussed earlier.
At a higher level, software like context level search, backup, snapshots and replication are prime examples of applications that make heavy use of metadata. The search and compare processes that occur in these applications to identify changed data blocks include extensive metadata operations. With any I/O latency in file metadata, the task of identifying what needs to be backed up or replicated can be as time consuming as the actual transfer itself. Moreover, latency occurring in the I/O process for any of these metadata operations can have a disproportionate effect on overall performance. For this reason, file metadata I/O management is an area which promises to improve storage performance.
We should expect to see storage system vendors developing file systems which can segregate file metadata and facilitate its placement on faster tiers of storage. For example BlueArc tests have shown that with tiered metadata, I/O performance can be maintained while reducing spindles using the right combination of SAS and near line SAS or SSD and near line SAS drives. Depending on disk configuration and metadata an estimated total spindle cost savings typically range from 5% to 20%.
The most important advantage that improved metadata performance should deliver is that all this performance improves with little to no change to the underlying hardware. Instead of going to the expense of adding an SSD, cache tier or worse, adding to drive spindle count just to improve performance, many organizations may find that enhanced metadata performance delivers all the improvement that’s needed at the moment. Metadata management is a software-only (file system) change to the environment and the impact to the user is minimal, simply a planned upgrade. Efficient metadata operations doesn’t reduce the effectiveness of other performance enhancements. In fact, steps like inserting SSD or fast SAS can actually be made more effective by improved metadata efficiency.
The ability to manage file metadata and place it on a faster tier of storage is a perfect complement to automated tiering and may be a better first step than using a caching approach. As a fundamental improvement in file system operation, metadata management can optimize storage investments and increase spindle efficiency independent of the benefits provided by other storage technologies. It allows high performance storage (like SSD) to be leveraged efficiently across a wide range of applications that don’t typically benefit from the use of such technologies. Putting metadata into SSD, for example, can improve the performance of file operations across the board, without requiring the extensive high-cost SSD capacity needed to store entire files.
George Crump, Senior Analyst
BlueArc is a client of Storage Switzerland
- Improving Storage Cost Efficiency and I/O Performance by Tiering Metadata
Related Articles
What is pNFS and why should you care?
Storage-at-Scale Systems Save Costs
Cost Effectively Scaling Storage Performance
Unstructured Data Growth ‘Storage at Scale’
Proficient Object-based Replication
Use Clones to Manage VMware Storage Growth
Solve Boot Storm with High Performance NAS
Storage: Scale Up or Scale Out
DC Virtualization-IOPS Most Important
IOPS is More Important Than Air
Related Blogs
File System Tiers Metadata for Performance
BlueArc uses Permabit’s Dedupe Engine