Once server virtualization is implemented within a data center and the initial cost savings of consolidation are realized, one of the quickest ways to continue to drive down those costs is to increase virtual machine densities. The challenge with this strategy is that every VM brings with it a need for CPU, memory and I/O resources, at levels which are highly random. As the density of the VMs on the server host increases, the probability of a few VMs suddenly, and for a short time, hitting a peak load also increases - with the potential of starving the other VMs and possibly causing application downtime.


The processing power of today's systems are typically more than adequate for relatively high machine densities and these sudden peaks in loading. Densities of 20 VMs to a physical host are not out of the question and enough memory can be added to servers to support these levels. Soon 30-50 VMs per physical host will be realistic. The challenge is properly and cost effectively allocating the I/O resource, which is not as plentiful but is more server space consuming than CPUs and memory.


Bursting memory and CPU resources is made possible by allocating a shared pool of servers that are ready to receive additional load if the standard production servers reach a peak condition. When that occurs VMs can be offloaded from these production systems and moved to another, more idle system within the data center. The ultimate manifestation of this is "Cloud Bursting" where the VM is migrated to another data center or cloud compute provider. The concept of bursting mitigates the risk associated with designing data centers for normal workloads instead of peak workloads. Infrastructure bursting brings that same risk mitigation to the I/O profile that server migration brings to CPU and memory profiles.


Infrastructure bursting is enabled by using I/O Virtualization (IOV) in a virtual server environment to develop an infrastructure that can expand and contract quickly, to meet temporary I/O profile changes. Essentially a shared pool of bandwidth is created that is shared amongst all the host servers in the virtual environment.


Without IOV server hosts must be configured to handle a maximum peak load by adding multiple I/O ports to those hosts. This is not only costly but also inefficient, as it’s hard to predict which type of I/O load will be needed at any given point in time - storage I/O or network I/O, both of which typically today require different interface cards. This selection is further complicated by the fact that most server hardware today has an increasingly limited amount of space to accept physical card slots. But buying a larger server just to get more I/O slots is costly and consumes precious data center rack space. Also, as the infrastructure evolves, those server hosts must be revisited to install new I/O cards as needed, often requiring a server be brought down, as boards are installed and new drivers loaded.


As a result the virtual infrastructure designer must decide and often compromise between storage I/O and network I/O. The capacity planning aspect of deciding how much of which I/O capability to add to a physical host as VMs multiply becomes a critical task. Simply adding the highest quantity possible of the fastest cards available is not a viable option either. High speed cards are still comparatively expensive to buy in quantity and the entire infrastructure must be high speed as well, to benefit.


Unlike CPU or memory capacity, I/O capacity is dependent upon the capabilities of the rest of the infrastructure. Adding faster or more CPU or memory almost always improves VM performance immediately, the consumption of the resource only being dependent on the internal capabilities of the server. I/O is more dependent on the external capabilities of the infrastructure. Replacing a 1GbE card in a server with a 10GbE card but still connecting that card to a 1GbE switch port is not going to increase I/O. The I/O changes have to be matched, by either upgrading the switch or adding cards of the same speed to the server.


The result is many cables of differing types emanating from the back of the host servers in the virtual infrastructure. Worse, the cables and cards have to constantly be revisited, upgraded and replaced as the environment changes. In the static world prior to virtualization this was not as painful, now it could be an almost endless task.


I/O virtualization changes all of this. A host server can be configured with a pair of redundant, high speed IOV-capable cards. Then the cards can connect to a typically rack based IOV gateway that has a shared pool of I/O cards available to all the connecting hosts, a variety of both storage and network cards. Also, in the gateway, several cards can be set aside as spares or burst capacity and do not need to be allocated to any particular server up-front. During normal operating loads each server still has an assured bandwidth of both storage and network traffic.


When an I/O load peaks on a particular server, that server can ‘borrow’ bandwidth from another active card that is not as utilized or from one or more of the additional cards in the I/O gateway, for the timeframe it’s needed. It is important to realize that it doesn’t need to be predetermined how much additional bandwidth this peak load will represent, or what type of I/O (or both) that the peak will require. The virtual infrastructure administrator simply allocates the cards as needed to the particular host via a software interface.


Compare this to manually installing a new card in a server and loading drivers for that card. Reality is the peak I/O requirement only occurs for a short time and often only requires a slight increase in available bandwidth. Installing a new card is essentially a permanent fix and almost always provides more bandwidth than even the peak requires.


While they can support it, the I/O cards in the I/O gateway do not need to be the latest high speed cards. Instead they can be cards that match the existing infrastructure. For example, dual port 4Gb fibre cards can be placed in the gateway and then allocated to the attached servers as needed. Typically, only one of each will be required for normal operations with the potential for accessing the additional cards when peak conditions occur. Also, notice there is no need for redundancy other than the IOV connection itself. On a failure another card is simply mapped into place of the failed card or port.

The cards that connect to the IOV have traditionally been high speed Infiniband or PCIe extension cards. While these cards delivered high speed, they often meant yet another infrastructure type to pay attention to in the data center. An emerging model is to enable IOV over a high speed Ethernet configuration (eg PCIe with SR-IOV over Ethernet) that leverages the current infrastructure and the networking strength of Ethernet.

Aprius is a client of Storage Switzerland

George Crump, Senior Analyst

 
 Related Articles
  The Use Cases for Shared PCIe SSD
  How to Share PCIe SSD
  Using IOV to Cable Once & Keep Flexibility
  Offload I/O from Hypervisor with SR-IOV
  Thin Provisioned Networks
  What is I/O Virtualization?
  Aprius IOV Technology Evaluation Platform
  Comparing I/O Virtualization Technologies

../../2011/3/16_The_Use_Cases_for_Shared_PCIe_SSD.html../../2011/1/10_How_To_Share_PCIe_SSD.html../12/8_Using_IOV_To_Cable_Once_And_Still_Maintain_Flexibility.html../9/29_Offloading_I_O_from_the_Hypervisor_with_SR-IOV.html../5/12_Thin_Provisioned_Networks.html../6/21_What_is_I_O_Virtualization.html../../../../Blog/Entries/2010/5/28_Aprius_IOV_Technology_Evaluation_Platform.html../4/21_Comparing_I_O_Virtualization_Technologies.htmlshapeimage_2_link_0shapeimage_2_link_1shapeimage_2_link_2shapeimage_2_link_3shapeimage_2_link_4shapeimage_2_link_5shapeimage_2_link_6shapeimage_2_link_7