Requirements For Enterprise Server Side Caching
Requirements For Enterprise Server Side Caching
Caching has been touted as an ideal way to begin leveraging solid state storage in the enterprise. It allows for modest deployments of this high performance technology by automatically accelerating the most active data sets. This keeps costs down and saves already overworked IT administrators from having to go through a data classification exercise.
Lately, there has been a push to move that caching to the host server, even in a shared storage environment. The goal is to lessen the burden on the storage network, the storage system controller and bring the most performance sensitive data closer to the CPU. The eventual goal could be to turn the SAN into a medium-term storage repository filled with cost effective, capacity centric HDDs, and have the server storage tier be used for the most active data.
In the enterprise implementing this strategy requires more consideration than simply buying a PCIe SSD card from a storage supplier. There are differences in philosophies between these suppliers and it’s important that the storage manager select the server-side caching strategy that meets the needs of the enterprise. Some of these strategies, even though they are provided by enterprise storage systems companies, cannot truly be considered “enterprise”.
Server Side Cache Goal: Eliminate the Storage Mainframe
The shared storage system today is similar to what the mainframe was years ago. While there are more vendors providing their versions of this ‘storage mainframe’, the concept is similar: a single large storage system that’s responsible for all data. Servers and users that access data must go through the mainframe. This means that the storage mainframe must deliver extremely high performance and high availability since all the “eggs” are in its basket. Finally, when more storage or performance is needed a newer, faster, more expansive storage mainframe is required, which has led to the storage refresh cycle that IT dreads.
Server computing is of course heading in a different direction, towards that of a highly distributed set of servers that run a finite number of virtual applications. When more server compute power is needed current servers are not replaced. Instead, new servers are added to the infrastructure and workloads are rebalanced across the aggregate resources.
Server caching solutions have the opportunity to break the stranglehold of the storage mainframe. By installing rather large, PCIe based, solid state storage devices inside of servers, the performance demand placed on the storage infrastructure is lessened and the data is moved closer to the application that needs it. Even in a read-only cache environment, where writes need to go to the shared storage system, performance improves because read I/O is not needed from that system so write I/O has full bandwidth to perform its functions.
When used in this manner, server side caching allows the current mechanical hard drive based shared storage investment to be extended well beyond its original implementation, which is now primarily data storage and data protection. If the PCIe SSD can support multiple brands of shared storage systems the storage administrator can select a different type of storage system altogether, one that’s focused on providing cost effective capacity instead of trying to balance performance and capacity.
Requirement #1: Multiple devices per server
If the goal of the server side cache is to eliminate the storage mainframe then one of the first requirements from the PCIe solid state manufacturer is to support multiple devices per server. This can be to provide high availability and more total raw capacity.
It may seem like high availability shouldn’t be needed in a cache, especially if that cache is read-only. If the cache fails then data can be safely read off the hard drive. While this it technically accurate there’s another factor that should be considered. Once this performance boost is provided, users of those applications begin to expect that performance will be maintained. Further, applications that are developed after server side caching is introduced will be written to expect server side caching as well, allowing developers to accomplish more with less server hardware. But if that cache goes down and access has to resort back to HDD performance levels, users will be disappointed or worse, the application may be unresponsive.
Having multiple devices on a server solves these problems. Multiple cards can be configured in a RAID or mirrored configuration to ensure cache uptime. These configurations also allow the cards to be spanned so that capacities beyond the limits of a single card can be realized, thereby reducing the likelihood of a cache failure. Companies like Fusion-io with their ioTurbine software solution allows for the support of multiple devices in a server.
Requirement #2: Capacity Per Device
In addition to supporting multiple devices in a server, the capacity per device is also important since some servers won’t have the internal PCIe real estate for more than a few cards at a time. As with the multiple devices per server requirement, a high capacity PCIe card allows for larger caches. If the goal is to eliminate the storage mainframe then the capacity of the cache has to be great enough to hold all active and even near-active data. A cache miss to a shared storage tier that’s now been relegated to a capacity option can have a significant performance impact; an 80% accuracy rate may not be acceptable. With a larger cache size, cache accuracy can be raised to 99% or greater, which means the application will rarely see the performance loss of a cache miss.
Requirement #3: vMotion Support
VMware is the core architecture in many, if not most data centers. Direct support for this platform has to be more than just being able to run within the environment. The cache software has to coordinate flushing the cache in conjunction with the VMware Hypervisor. While the concern with a write cache is obvious, there is even a concern around read caches. They too need to be flushed in order for vMotion to properly execute.
Without this support, vMotion of a cached VM could lead to corruption. The ‘solution’ from other vendors is to not support any vMotion with their caching software. Obviously, forgoing one of the most valuable features of VMware is not an option for IT. But the other solution, a manual cache flush, migration and manual re-enablement of the cache, is untenable as well.
Requirement #4: Dynamic Allocation of Cache Memory
The ability to support vMotion leads to the next requirement, the ability to dynamically allocate cache memory. If a cache-dependent virtual machine is migrated from one physical host to another, policies should be followed that allow that newly arriving VM to be allocated to the appropriate level of cache so that its performance can be maintained on the new host.
Without this capability, VM performance would suffer on the new VM and, more than likely, the virtual administrator would have additional tasks to complete. What was once a seamless migration of VMs becomes a manual process as described above; manual cache flush, manual migration, manual re-enablement, and now, manual memory reallocation as well.
Requirement #5: Bypass of Storage Stacks
An advantage that native PCIe solid state devices have over other implementations is a direct line of communication with the host CPU. This reduces latency, increases overall performance and reduces resource consumption caused by data transfers. All native PCIe solid state devices though, require a driver in order to be recognizable by the host. Many manufacturers take the approach of using the operating system storage protocol stack to achieve this recognition. While potentially a more rapid development path, this method re-introduces latency and lowers performance.
A more efficient approach may be to bypass the guest OS and hypervisor storage stacks, allowing the PCIe card to have that direct line of communication re-opened. This is more than just a case of semantics. Server side caching is a performance focused initiative, anything that detracts from that goal matters.
Summary
Server side caching is an ideal way to extend the life expectancy of a current storage investment. It also has the potential to radically change what type of storage is purchased for the shared storage infrastructure by moving to a more capacity-centric selection criteria. With this model, performance can be accelerated and significant cost driven out of the storage infrastructure, both in terms of storage system selection and storage network design.
Fusion-io is a client of Storage Switzerland
Previous Entry: “Why Flash Wears Out and How to Make it Last Longer”
Thursday, March 15, 2012
George Crump, Senior Analyst