Where NFS caching is critical is in compute and data intensive (e.g build, render, or search) application server environments that have scaled front end performance by leveraging a scale out server architecture. NFS is a natural backend storage architecture for many of these scale out server environments since the nodes in the cluster all need access to the same data area at the same time. The challenge is that they are being constrained by performance problems with typical NFS based Network Attached Storage (NAS) systems.


An NFS share served through a typical NAS has read IOPS of only 100 IOPS per disk. They also have to handle a large amount of metadata access which steal some of those limited IOPS. While internal read caching can help with the performance challenges a little, they are often too small in memory capacity to store the entire data set. The overall impact is that read latency is greater than 10ms which leads to significant idle time on the application server CPU as it waits for storage to respond to its requests.


The typical fix for this it to deploy faster and significantly more expensive NAS heads with higher drive count and faster rotation speed. Those drives are then only formatted on the faster outer edge of the disk platter (called short stroking) to reduce latency. In parallel to this the application is further segmented (sometimes called sharding) so that more servers can make I/O requests at the same time so enough queue depth can be built to justify the deployment of the higher and higher drive count RAID groups. Half formatted drives, dozens of highly underutilized servers (from a CPU perspective) and large drive count RAID groups all to respond to requests for information as fast as possible becomes a very expensive investment in capital, time and resources, such as power and data center floor space.


These issues lead to the creation of the NFS caching market. An NFS cache is a device that uses RAM or solid state storage to front end a NFS NAS storage system. While these systems address the issues of IOPS and latency they have not, at this point, been widely deployed. The biggest problem with these systems is the lack of memory capacity, or in the case of the legacy Gear6 solution, dense capacity. As a result the entire data set can not be stored in cache which means that there is a good chance of a cache miss, which means the application server has to suffer through the above listed performance problems plus have the delay caused by checking the cache for the data.


The second problem with these small cache sizes is that in a highly active data set the data has to be constantly reloaded to the cache memory area. This leads to a very limited deployment, if any, and certainly can not be used broadly to solve an enterprise wide NFS access problem. Fitting the entire data set into memory is critical, leveraging the memory array’s dense architecture allows high capacity flash memory to be delivered in a cost and space efficient manor.

George Crump, Senior Analyst

Violin Memory is a client of Storage Switzerland

- Enterprise Grade NFS Caching

The Gear6 solution offered a scale out NFS caching solution that provided, via its cluster management software, excellent scalability. The challenge for the Gear6 solution was the per node memory capacity was very limited. While cache capacity could be scaled across many nodes, the per node cost and the rack space consumed by those nodes made it less desirable to customers. Violin Memory addresses this shortcoming by leveraging the very dense capacity of their memory arrays which we detailed in “What is a Memory Array?” and combined that with the former Gear6 software to develop an enterprise grade NFS caching architecture. The new system provides enough memory to load the entire active data set into the cache area in 90% less space and with 95% more power efficiency.

A second critical capability is that the vCache is not “in-line” meaning that if in the off chance the cluster fails or if for some reason you want to move away from its use then there is no reconfiguration required to continue to have access the data. This also means that implementation requires no change to application or user configurations. The Violin vCache uses  0.7 rack units per TB, only uses 200W per TB and delivers 300k IOPS all at a list price of less than $60k per TB.

This investment can be distributed across multiple NAS appliances; it does not require a one to one mapping. This is again where the density of the vCache gives it a distinct advantage as well as its external nature. It has the capacity and of course the horsepower to provide caching services across multiple vendors NAS heads simultaneously.


The alternative for the compute and data intensive application server environment is to upgrade their current NAS product to a higher performing system, with many more disks, formatted at half capacity to get about 4X the acceleration. This “upgrade” still leaves about 5 to 20ms in latency, which means further sharding of the application to take advantage of the faster storage. This does not take into account the cost and impact of moving the entire data set from the old NAS system to the new NAS system. The vCache alternative allows the current system to be leveraged and gain a 5X acceleration in performance. In this configuration latency is reduced to 0.2ms which will not only prevent the sharding of the application but actually reduce the number of application servers required to provide sustained performance to the users.


The configuration of the vCache and its price point may also move NFS caching appliances out of the compute-intensive application niche. General-purpose applications can now benefit from this type of technology like analytics, server and desktop virtualization.



Storage Switzerland's Take


The NFS cache market has seemed like a logical growth market for storage companies. Yet those systems have struggled to gain acceptance in the enterprise data center. Even the solutions that were provided by the NAS vendors themselves have seen limited adoption. As we will discuss in an upcoming article “Scalable NFS Acceleration With Enterprise Solid State Cache” the core challenge with current cache technology was how to provide enough cache scalability to store the entire active data set in memory so as to not suffer with the cache re-loading issues discussed above. Gear6 solved this with its cluster-able scaling capability but its acceptance was limited by the cost and floor space issues of their architecture. The merging of that technology with the densely configured memory array from Violin Memory resolves the last hurdle, as a result we believe that NFS caching is now ready for enterprise grade deployments.


Adding capacity to a system should only be done to address one problem, running out of storage space, it should not be used to address performance issues. Unfortunately until products like Violin’s vCache, a NAS head upgrade or the addition of a small (high priced) read cache were the only options. Now the assessments can be made separately. If there is performance issues add cache. If there is a capacity issue add capacity. Revolutionary in its simplicity.