The Advantages Of Storage System Based Caching
The Advantages Of Storage System Based Caching
Solid-state storage has been available as a performance option to the data center for over a decade. However widespread adoption of the technology did not occur until the introduction of data center quality flash memory. Now, thanks in large part to flash-based storage, data centers can choose this performance option and do so at a lower cost per IOPS (I/Os per second) while generating more IOPS per watt.
Despite the cost attractiveness of solid-state storage, when measured on a per IOPS basis, the raw cost of the technology is still 10x to 15x the per-gigabyte cost of hard drive storage. Additionally, not all data sets can take advantage of the performance that solid-state storage can offer. As a result storage vendors have tried to deliver a variety of options to more effectively use solid-state storage, with the goal being to deliver as much performance as possible with as little solid-state capacity as possible.
One of the more popular options is caching, which leverages a smaller amount of solid-state storage to hold the most active data set and uses hard disk technology to store less active or even inactive data. As the data center starts a long journey to a solid-state only data center, caching is potentially the most viable gateway to that destination.
All caching is not created equal
Caching can occur in a variety of locations in the storage infrastructure. Data can be cached locally on the server, globally on the storage network or cached on the storage system itself. Each has its advantages depending on the customer’s need.
Caching on the storage system for example brings the advantage of the cache being integrated into the storage software stack instead of being bolted on as an afterthought. Vendors though need to take advantage of this integration to better leverage cache location.
For example, Nexsan has integrated caching into their new E5000 systems, and written the storage software layer to take advantage of the cache’s existence. Systems that integrate cache in this manner not only improve potential overall system performance but also increase the life expectancy of the flash-based storage - while still being more efficient with the amount of solid-state storage required.
Leveraging DRAM and Flash
An ideal way to integrate cache on to the storage system is to have a multilevel caching tier that uses a combination of DRAM and flash-based storage. DRAM has better write performance than flash and doesn’t wear out over time like flash does. While flash’s write performance is not as good as its read performance it does handle large write I/O better than small writes and of course is less expensive on a capacity basis.
With a multilevel caching tier in place the file system could be designed to cache all initial writes to the DRAM-based tier and then send an acknowledgment to the application or user, increasing application performance. Then, those writes could be coalesced into larger writes that would be sent to both the flash-based cache and the hard drive tier at the same time.
This process is called “write journaling” and is common in database environments as a technique to accelerate performance. Applying it to a file system with multiple tiers of solid-state storage and mechanical storage is a logical next step. As stated above, the impact of this architecture would be better response time for the user application when new data is added to the storage system. It could also improve performance when existing data is updated, (i.e. a write takes place), thanks to the performance of DRAM-based storage. Having the DRAM tier will then lead to better write performance when updating the cache tier and better flash life expectancy, since redundant writes may be eliminated by the coalescing process.
Integrated cache and the storage system
Besides the capabilities of storage software integration, having cache integrated directly on the storage system has other advantages. First, there is the value of having the management software understand that cache is available. This would allow you for example to assign specific volumes to be cached and leave other volumes that would not need flash-based performance, outside of cache. A volume designated for backup files is a good example.
With the cache integrated into the storage system the memory based tier (both DRAM and flash) could take advantage of the fault tolerance built into the storage system. So in the event of a storage device failure, or even a storage controller / NAS head failure, data availability and cache consistency could be maintained.
There is also potential for the elimination of more expensive higher speed SAS storage by only using the caching tier and SATA drives. Using SATA instead of SAS HDD would allow for a greater savings which could be invested in a larger cache tier. Caching is so fast, and de-stages to SATA fast enough that the mid tier SAS role is probably then eliminated.
Finally, the cache could leverage the same expansion capabilities that the storage system provides to mechanical-based storage. For example, Nexsan’s system can expand both the DRAM and the SSD flash storage to 12x its starting capacity. Obviously, the larger the cached tier the smaller the chances of a cache miss.
Ideal use cases
There are several ideal use cases for large storage-based caching deployments. First is the traditional use case of an OLTP database application. But unlike legacy SSD implementations the cache algorithm will automatically accelerate the right components of the database at the right time. In legacy environments manual movement of the data is required.
A new opportunity also exists. IP based storage systems like NAS and iSCSI are becoming increasingly popular platforms to provide the storage for virtual server environments. The NAS use case for virtualization is particularly interesting since virtual machines are merely disk files, having a system that’s designed for managing files also store those images makes sense. It also allows the virtual environment to take advantage of built-in capabilities like snapshots and replication.
The challenge faced by any storage system attempting to support a virtual server environment is that these environments generate a significant amount of random I/O, because of the amount of VMs per physical host. This is something that mechanical hard drives struggle with but is the type of I/O that’s ideal for memory-based storage. The challenge of course is knowing which VMs will become active at any moment in time, since it may be cost prohibitive to store the entire virtual environment on solid-state storage.
NAS-based caching may be the ideal solution to this problem. The virtual machine files stored on the file system are automatically identified and moved to memory-based storage so that the random nature of their I/O is not bottlenecked by mechanical drives.
Another example of caching benefiting the virtual server use case is when a VMDK is accessed by a recovery program that can open the backup file directly. Thanks to the cache the rapid acceleration of having the whole VMDK staged to cache greatly enhances a recovery.
A final important capability of storage system based caching, as it relates to virtual server environments, is that it doesn’t break the migration function that many users count on to move virtual machines from one physical host to another. If the caching was inside the physical server or potentially even in the network, critical data might not be flushed during a migration, potentially causing problems for the virtual machine when it lands on the new host.
Since a storage system based cache is on the physical storage system the movement of a virtual machine from one physical host to another would not be impacted because of the cache. Just like mechanical hard drive storage, it is globally and seamlessly available.
Summary
Caching is an ideal bridge on the journey to a solid-state-only data center since it automatically provides optimal use of a premium priced high-performance tier of storage. However caching cannot just be arbitrarily added to an environment for maximum benefit. To get the most out of the solid-state investment it should be integrated into the file system so that multiple tiers of memory can be leveraged and the capabilities of the storage system itself can be applied equally to the caching tier as it is to the mechanical tier.
Nexsan is a client of Storage Switzerland
Previous Entry: “What is Transparent SSD Caching?”
Tuesday, January 3, 2012
George Crump, Senior Analyst