Resolving The $ per GB Problem of SSD in Virtual Environments
Resolving The $ per GB Problem of SSD in Virtual Environments
The problems caused by a virtual desktop or server within a storage infrastructure have been well documented. There are performance issues caused by the extremely random I/O (Input/Output) of dozens of virtual machines on each host. There is also a storage capacity concern as server and desktop storage that used to be direct attached to each physical server is now moved to a shared storage device. Striking the right cost/performance balance is critical so that the virtual project can continue to deliver and improve on its ROI (Return On Investment).
In large part the solution to the performance challenge is the intelligent use of solid state drives (SSD) within storage systems. Most virtual environments can generate more than enough I/O demand from the storage system to justify the use of SSDs. However, unlike database systems they can’t always generate “profit per IOP” (Input/Output Operations Per Second) that is needed to justify the cost of an SSD. This means that, especially in the mid-rage data center, a balanced approach between SSD and HDD (Hard Disk Drive) is needed.
The All Flash Resolution
There tends to be two approaches to resolving the demands placed on storage systems by virtualization projects and to do so affordably. The first is to use a flash only storage system. This certainly will resolve most of the performance issue. Flash is well suited to the random workloads of the virtual environment.
To address the cost issue, these systems will often use inline deduplication and compression to reduce the effective cost of flash storage by being able to store more data on that tier. The trade off is generally worthwhile in all flash systems. While there are extra steps required to verify redundant data, the speed at which flash can process those steps and the value of increasing by as much as 5X the effectiveness of a premium platform is a worthwhile trade-off.
The challenge is that an all-flash system, even with optimization, is still going to be too expensive to slide in under budget. This leaves data centers looking for other options. Finally these systems tend to be block only, which leaves the data center looking for a alternate storage system for their file services.
The Hybrid Resolution
Another option is to use a hybrid SSD model where a small amount of SSD is used in the storage system to cache the most active data but with little to no optimization to improve storage efficiency. These systems also attempt to reduce costs by providing NAS and SAN capabilities in a single package, reducing the number of storage systems that the storage manager needs to manage.
Hybrid systems gain their cost efficiency by using hard disk drives for less active data and SSD for active data. They also resolve the performance problem, as long as the needed data is in cache, but the risk of a cache miss or data being delivered from the HDD is obviously higher than a flash only system.
This means that the HDD storage area has to perform adequately to provide adequate performance to the 15-20% of data that will initially be read from hard disk due to cache misses. To compensate, these systems tend to sacrifice deduplication and/or compression in order to make sure that hard disk performance is not impacted significantly. Eliminating the overhead of deduplication maintains the performance of the hard disk area but decreases its cost effectiveness. Also, the SSD cache is often not optimized either which means that it may not be used as efficiently as possible.
The result is that even though they can perform multiple functions (SAN and NAS), eventually more than one hybrid system may be needed because capacity limits of the HDD or SSD storage are reached. As with any other environment more systems means added management and additional cost.
The Hybrid Optimized Resolution
Hybrid optimized storage systems like those from Tegile Systems take a best of both worlds approach. This type of system has similar functionality to a standard hybrid system but these systems bring the deduplication and compression that flash systems leverage, to a mixed array. The impact is a significant reduction in the cost per GB of the virtual storage infrastructure, while making the flash area more productive. Like standard hybrid arrays these systems also provide SAN and NAS services but now they have the capacity optimization to allow a single system to potentially meet the entire data center’s performance and capacity needs.
The key is to provide this functionality without impacting performance.
HDD Deduplication without Performance Impact
Deduplication requires a significant amount of meta-data management and lookups. If this work has to be done on an HDD, the latency of the hard drives as they rotate into position will likely create a noticeable performance impact. Flash systems avoid this problem because they don’t have hard disks to deal with and meta-data is stored on flash just like the actual data.
Meta-data information does not need to be stored on the same storage area as the actual user data though. There are many examples in the storage industry of companies storing other types of meta-data on faster mediums like SSD. Hybrid optimized storage systems leverage the ability to store meta-data separately from the actual data and store the deduplication information on flash.
The value in making this split is that deduplication can now be applied universally on both hard disk and solid state disk. Making a cost premium storage area like SSD, 5X more capacity efficient, reduces the amount of flash storage needed and the overall system cost as a result. Even though hard drive pricing is already very affordable, there is still a hard cost associated with acquiring, powering and cooling them. The less hard drives needed the better.
Leveraging DRAM to Improve Performance
To process this meta-data, inline hybrid optimized storage systems like Tegile’s can also leverage DRAM (Dynamic Random Access Memory). Similar to flash, DRAM has more than one purpose in the system. First, DRAM is used to manage data ingestion so that it can be quickly analyzed for redundancy. Again, the comparison is DRAM storage communicating with flash storage for rapid results. The DRAM is also used to cache the most active read data. Writes are also cached but to flash for an extra layer of protection in case of power failure.
Leveraging Hybrid Storage To Optimize Host Memory
Another benefit of flash is that it makes an ideal location for storing virtual memory for RAM paging and swapping when DRAM resources are maxed out on the host. Flash responds almost as fast as DRAM so the virtual machines will notice almost no performance impact as a result of the virtual to real memory swap. Many vendors will suggest that this be done on a separate flash card inside the server. Which of course adds to the expense.
Hybrid Solutions like Tegile’s that are connected via a high speed storage network and have the ability to pin certain data volumes in the storage systems’ flash memory can provide this capability without the extra expense and management of a separate card in each host. This allows for the density of the virtual machines on the host to increase dramatically since the limitation of DRAM is removed. It also allows the density to be achieved while keeping costs down, since high performance virtual memory almost comes along free.
Summary
Hybrid optimized storage systems can help resolve both the $/GB that many virtual environments are facing as well as the overall shortage in IOPS. The ability to limit and control capacity growth through deduplication and compression should also lower the number of systems required. Many mid-sized data centers may only need one, which of course further reduces cost and complexity.
Tegile is a client of Storage Switzerland
Previous Entry: “How To Make Cloud Storage As Secure As Local Storage”
Thursday, August 23, 2012
George Crump, Senior Analyst