The VMware Memory Balancing Act
The VMware Memory Balancing Act
Increasing memory allocation is one of the best ways to improve the performance of a virtual machine, but this resource comes at a premium. Not only does the physical host hold only a limited amount of memory, it’s also now priced at a premium according to VMware’s new licensing policies. Pinpointing a memory related performance problem and knowing how to address that problem is now an even more important role for the VMware Administrator.
Managing VMware memory allocations is a delicate balancing act between VMs with too much memory and servers with not enough. While it may be tempting to simply over-allocate memory to VMs this practice causes two significant problems. First it starves VMs of memory that could truly take advantage of that additional resource. Just like extra memory in a physical server, allocated memory that’s going unused does not benefit the virtual machine at all. The second problem is that there’s a software licensing cost associated with how much memory is allocated per server. Having additional, unneeded RAM in a server not only costs the price of the RAM but also now raises the software costs.
A virtual machine with insufficient memory will not typically crash. VMware does an excellent job of utilizing disk capacity as additional RAM. The first part of the memory optimization challenge is understanding the different terms used to describe the process. Virtual machines use physical (real) memory. Guest OSs, where the VMs are running, use virtual memory provided by the hypervisor. The Guest OS pages memory to disk while the VM swaps memory. The two are driven by different parameters: configured virtual memory size within the Guest OS and reservation/limit/shares configured within vCenter. In either case when memory has to be swapped to disk, performance will obviously suffer and will cause more back-end I/O. However, if memory never swaps to disk then too much memory may have been allocated.
It is important to understand what actually does the paging on a host server. The ESX host pages memory to disk but not the virtual machine. The guest OS inside the virtual machine also pages. This makes the monitoring of the guest OS’s use of virtual memory critical. Since paging does not occur at the VM level within this abstracted hierarchy, vCenter is unaware of its impacts. As disk paging occurs vCenter will not be able to report the exact cause. There will just be unexplained disk activity.
These conflicting interests and invisible occurrences are why properly balancing memory becomes so important. RAM is a precious resource that can greatly enhance performance. But, because of its limited capacity and dual cost implications, the tendency would be to use it as sparingly as possible. However, used too sparingly, memory can cause a significant decrease in VM performance.
The Detection Challenge
Detecting which virtual machines need more or less memory is a challenge in the virtualized environment. The number of VMs and the speed at which memory utilization conditions can change make it almost impossible to track manually. For example, an average utilization statistic won't provide much value since many VMs go relatively unused during the off work hours. Or their utilization could be high during certain conditions where performance is not a concern, like with nightly backups.
Detecting a memory shortfall at a moment in time is also a problem. If a user notices a performance degradation for example, memory may be the cause of the problem. But by the time the user notifies the administrator who checks what the current memory utilization is, the condition may have well passed.
The problem is that most of the built-in memory utilization capabilities of VMware can only report memory utilization at a particular moment in time. There’s no easy way to compile a trend for this statistic. Also there is a difference between what the guest reports as available memory (used vs. free) and what the VM reports (consumed, active or shared). Used memory is a better indication than consumed or active memory and a tool is needed that will capture that information.
Detecting memory utilization and optimizing it is also something that cannot be attempted manually. Memory utilization is too dynamic and impacts too many VMs. Even storage capacity, which is more static in nature, is difficult to track through manual spreadsheets and reports; and memory is even that much more difficult. It’s also unreasonable for an administrator to stare at a VM console all day long watching for a memory problem to occur.
The ideal way to analyze memory utilization is to measure it over the course of important business hours, over a period of days or even months. It will require a software tool like NetApp's OnCommand Insight Balance that can capture this information and provide an historical representation of memory utilization per virtual machine. This allows the data to be ‘shaped’ into a meaningful representation of what’s important to the enterprise. For example you may want to ignore RAM utilization during the off hours so that it doesn’t skew the business day RAM utilization data.
Optimizing Memory Utilization Steps
Armed with a tool like NetApp's OnCommand Insight Balance the first step is to allow it to collect the historical memory utilization statistics of the virtual environment. Then, this data should be presented in a sortable view and reported so that two types of virtual machine RAM problem areas can be identified. The first is to view the VMs with performance problems. These are VMs that have a high number of page outs or times that real memory gets exchanged with virtual memory on hard disk. These VMs should have their memory allocations increased to lower the page outs. However, before installing additional memory, the next step in optimizing this resource should be taken, decreasing memory allocation to unqualified VMs.
Many VMs in the environment make no use of the memory being given them and could afford to have that memory scaled back significantly with no impact on performance. Here the administrator can take that same VM report and sort it by average percentage used over time. Again, because of the data shaping capability, this is memory used only during important production hours and is not being skewed by inactive time periods. The administrator could also look at the maximum amount of memory used during that timeframe and make an optimization decision based on that data.
For example, if the average memory utilization during business hours (thanks to being able to shape the data) is less than 50% and the maximum utilization has not shown to be more than 60% during all hours, then the administrator could safely assume that memory could be reduced by at least 40% with no performance impact to the VM. This memory could then be freed up to be allocated to other VMs that could benefit from more memory.
An interesting third option that this type of reporting brings is the ability to use solid state disk (SSD) as the swap out area. While DRAM is always faster than flashed based SSD, flash is less expensive and higher capacity than DRAM. Installing a local SSD can be ideal if there’s no room to install additional memory in the server and makes for a faster swap area than mechanical HDD. Especially with a server based PCIe SSD card, page-outs that used to be made to mechanical hard disk can now be done to high speed SSD.
3rd Party Solutions Can Help ROI
Memory is a resource that the VMware administrator must use to its fullest since it provides such a significant boost to performance sensitive VMs. However, memory must also be assigned judiciously, since there’s a hard limit to how much RAM a physical host can support and now, there’s also a licensing cost component to increasing memory. The ideal situation is to assign memory only to those VMs that can best take advantage of it.
Uncovering the information needed to make this determination requires a sophisticated analysis tool like OnCommand Insight Balance, since that information is not readily available from within the VMware console. Of course this is just one example of how a third party product can optimize and drive better ROI out of the VMware infrastructure. But due to the high value of RAM memory and the high payoff of its proper utilization, balancing memory in a VMware environment is one of the best first steps to take.
For More Information on Managing VMware RAM Read:
Overcoming The VMware RAM Problem
NetApp is a client of Storage Switzerland
Previous Entry: “Architecting a Scale-out Design for Scale-up Storage”
Monday, December 5, 2011
George Crump, Senior Analyst