The Need for Intelligent SSD Caching
The Need for Intelligent SSD Caching
Improving application performance is always a project of top importance to IT Managers and storage I/O is the area of focus for most of those projects. Solid State Disk (SSD) was thought to be the ultimate weapon in fighting this performance battle but it’s turning out that despite SSD’s obvious speed advantages compared to mechanical hard disk drives, optimizing its use is not so easy. Intelligence is required to make the most out of the investment in this premium storage tier.
The Evolving State Of Performance Tuning
In the past, servers with application performance issues were typically isolated to a small segment of the enterprise and their physical hardware environment was fairly static. Storage as an example, even if it was on a SAN, was still segregated to each particular server. As a result performance troubleshooting was a fairly discrete process, not the ongoing event common today. And, when a performance problem did occur analysis and remediation were straightforward.
In the modern, virtualized ‘shared everything’ data center, troubleshooting performance problems is almost impossible to do manually. Servers now run multiple applications. Hosts share data storage with each other and their guest operating systems. Even stand alone servers running database applications now have the ability to compartmentalize so much of the database that the server is essentially a self-contained virtual server environment.
In the legacy data center, performance could be improved with little to no planning. The approach of haphazardly throwing hardware at the problem often worked, especially when it came to SSD since the performance advantage this technology had over mechanical drives was so great. Given the complexities of modern data centers, proper leveraging of SSD technology requires a conscious effort that combines automation and intelligence.
The legacy approach of adding another tier type has quickly given way to automation. Either through caching or automated tiering, an increasing number of storage systems have a way to populate the SSD tier so that active data is automatically promoted to it. This allows IT managers to leverage solid state storage in a broader use case, like a virtual server or desktop infrastructure, or even a multi-layer application. Tiering and non-intelligent caching are reactive in nature, meaning that some number of I/O misses have to occur before the most popular data is moved to the SSD tier. Conversely, some amount of aging must also occur before it’s demoted to the mechanical tier. Most vendors stop at the point of automating the storage tier and fail to provide the intelligence and customization options needed.
Requirements For Intelligent Caching
The key deliverable of intelligent caching is to reduce, and ideally eliminate, the number of cache misses and time it takes to “warm” the cache. A cache miss occurs when an application is directed to the SSD tier for data and the data cannot be found.
“Warming” the cache occurs when data on mechanical hard drives is being accessed often enough to qualify for a promotion to SSD. The term refers to the time required for the tiering process to work through the tiered volumes and actually move these candidate data objects. It can occur when data is accessed at the same time each day but has to wait until enough mechanical accesses are made to complete its movement to the SSD tier. Reducing or eliminating this cache miss requires an understanding of the application and the environment in which it operates.
The first requirement is that the cache have knowledge of each application’s critical data components and an understanding of when those components should be on each tier. Ideally, this intelligence should place that data on the correct tier before it is ever accessed. An example would be a cache that’s intelligent enough to promote the files related to a virtual desktop environment up to SSD prior to the morning login event. Another would be pre-promoting critical application files, like redo and undo logs in an Oracle database, prior to being accessed by users.
Essentially, the intelligent cache picks up where other caches leave off. It installs like other caches and accelerates transactions by copying hot data to SSD, based on read intensity. The intelligent cache, however, keeps monitoring and learning at the application or environment level. Over time it gains an understanding about how each of the application's sub-file components interact and when they become active. It can then use this intelligence to pre-position data on the correct tier of storage, avoiding cache misses or warm ups.
Environments are far from static though, especially in today's virtualized data center. An intelligent cache like Cache IQ's RapidCache continues to monitor and learn about the environment. As certain files become less active or new files become more active it can auto-adjust to make sure that all are correctly positioned to minimize cache misses and warm up wait times.
A second requirement of an intelligent cache is an understanding of the physical environment around it. Accelerating storage I/O performance is not only about leveraging a faster storage medium like solid state storage. Since products like RapidCache are inline between the NAS storage and the connecting clients, they’re able to identify both network and client bottlenecks.
This means the intelligent cache understands the performance capabilities of the storage system and how its data is being accessed. It also understands the capabilities of the network that’s transferring that data, and the capabilities of the server to process more data at a faster rate. With this level of intelligence, it can automatically decide not to accelerate some active data because it’s not worthy of promotion. This could be the result of a network segment or client that’s too slow to take advantage of the faster storage medium.
The final element is visualization of the monitoring and analysis of the data. While automation is important to aid the busy IT professional, there are times where human intervention is more appropriate. Intelligent caching solutions should provide detailed analytics on the data sets they are monitoring to allow the storage manager to graphically see what’s in cache and what is not, as well as which data sets are trending toward cache placement and which are growing stale. This allows the storage manager to fine tune the environment. For example, if a new data set is being loaded into the storage system, the storage manager can pre-pin that data to the cache without having to wait for the analytics to justify its promotion.
Summary
Caching solutions are becoming increasingly popular for maximizing the investment in solid state storage technology. But without intelligence, caching does little more than throw hardware at the problem, leading to a large capital investment while realizing only a fraction of the performance gains possible. Intelligent caching solutions like Cache IQ's RapidCache bring the ability to maximize cache hits and by intelligently placing data in SSD and DRAM storage based on sophisticated application analytics to truly optimize the cache investment.
Cache IQ is a client of Storage Switzerland
Previous Entry: “Overcoming VDI Storage Challenges”
Tuesday, October 25, 2011
George Crump, Senior Analyst