Building The SAN-Less Data Center
Building The SAN-Less Data Center
Scaling the virtual environment is critical to realizing the ROI potential of server and desktop virtualization projects. The moment more server hosts have to be added because virtual machine (VM) density has reached its limit the ROI for the project stops. One of the biggest limitations to VM density are storage performance and the complexity of SAN designs caused by the virtual environment.
Almost every day a webinar or white paper comes out proclaiming to have the solution for overcoming the storage roadblocks to further virtualization rollout. The irony of these claims is that they always solve the SAN issue with a bigger, faster SAN. The better solution may be to look at the architectures being implemented currently by some of the largest virtualized environments, cloud compute providers, and see if a similar design could be made available to the traditional data center.
The Compute and Storage Relationships
The rapid adoption of virtualization was driven by how reliably it was able to virtualize applications. Most applications run just as reliably as virtual machines as they did on stand alone servers. This was true especially in the testing phase where the storage configurations were relatively simple and in many cases were direct attached. But as these virtual servers rolled into production and the hypervisor was tapped to provide flexible machine migration the storage infrastructure had to became more complex.
Shared storage was already complicated prior to virtualization, often requiring special skills to deploy and maintain. But it became even more difficult in the virtual environment. The key difference between a SAN that supports a virtual infrastructure and one that supports legacy servers running single applications is the way that the storage is partitioned. In the legacy, single-server/single-application model shared storage was sectioned or zoned off so that only one server could interact with each LUN.
In the virtual infrastructure the goal is the exact opposite. For functions like vMotion, Storage vMotion and Distributed Resource Scheduler (DRS) to work, all physical hosts need access to the exact same storage partitions. Virtualization also drives a wide variety of random workloads and each host in the environment because of the amount of virtual machines it supports is constantly demanding storage I/O performance.
This combination of truly shared storage and constant IOPS demand leads to complex storage network design, scaling limitations, controller saturation and overall poor performance. It also leads to complex work-arounds in order to overcome these issues. SAN vendors have tried bolting on solid state disk via cache or tier to accelerate active data, providing extremely high performance network bandwidth that goes mostly un-utilized or implementing scale-out storage systems that consume data center floor space and shift poor processor utilization to the storage system instead of the compute infrastructure.
The Cloud Generation Data Center Model
Large cloud services like Google, Amazon, Facebook and Yahoo may serve as blueprints that traditional data centers can use when starting the next phase of virtualization on the road to the fully virtualized data center. Each of these services shares a common compute architecture, typically, a heavily virtualized design where storage and compute are converged on the same components of that infrastructure.
In these distributed designs each server has a set of virtual machines running on them and storage internal to the unit. In most cases all the data that a virtual machine needs resides local to the physical server that hosts that virtual machine. As a result performance management is greatly simplified and is customizable on a host by host basis, as it was in the direct attached model described above.
However, these systems still provide data redundancy and VM mobility by replicating data across other nodes in the infrastructure. While each service has a unique way in how this data distribution is accomplished the final goal of flexibility and availability are still achieved.
The SAN-Less Data Center - Bringing The Cloud Model To The Enterprise
The enterprise typically doesn’t have the expertise or the budget to develop and maintain its own shared storage and compute environment as the companies above have done. As a result, even the largest companies have largely missed out on the advantages that these systems can deliver. While companies like EMC and NetApp have tried to create offerings to match these designs using what they call “converged architectures”, for the most part they’re just bundling disparate components with some pre-installation work performed. All of these “bundles” will eventually run into the same performance tuning issues that limit the virtual machine density on each physical host.
Matching the model of the cloud companies mentioned above is going to require a fresh approach borne out of those same cloud architectures, not the re-packaging of existing components. Companies like Nutanix are taking the proven concepts of the cloud service providers and designing solutions from the ground up that can be delivered to enterprise data centers.
These systems leverage custom designed server hardware to provide a densely packed but highly scalable infrastructure and a turnkey virtualization experience for the IT professional. Each node is complete with compute, storage and networking connectivity. Nutanix introduces a massively parallel data center architecture in which every VMware host has an embedded storage controller. The storage controllers coordinate among one another to form a single large distributed system.
Nutanix comfortably scales from 4 nodes to 4000 nodes, because there is no single point of failure or bottleneck. Architecturally, the metadata and cluster services share a lot in common with the web-scale architectures of Google, Facebook, Yahoo, and Netflix. SAN-less also means that data for each virtual machine resides as close to the compute as possible, i.e., on the local PCIe bus. The system constantly runs “Big Data” (MapReduce) jobs in the background to monitor access locality, and brings the hot data closer to a virtual machine, in case vMotion or DRS migrated that VM to another host.
This local data placement also provides a key ingredient for an ideal use of flash memory. Since all the data is local to the node that’s hosting that virtual machine, active, performance-sensitive data can be moved onto a PCIe based flash SSD in the node itself. This provides the optimal use of the performance capabilities of SSD, access to all hot data directly off of the PCIe bus.
Summary
Virtual machine density is the key to full realization of the virtualization ROI. The challenge is that the greater the VM count per host the more stressed the traditional architectures become. The answer is to model the infrastructure to match the virtualization layer by designing a scale-out architecture that is also massively parallel. Potentially, one of the best paths to implementing a scale-out architecture is to converge the storage capacity with the compute engine and eliminate SAN complexity by eliminating the SAN itself.
For a printable copy of this article please email info@storage-switzerland.com
Nutanix is a client of Storage Switzerland
Previous Entry: “Cost Effectively Solving Oracle Performance Problems”
Wednesday, April 18, 2012
George Crump, Senior Analyst