VMware Storage Options
VMware Storage Options
VMware continues its rapid adoption in data centers of all sizes, however one of the key requirements to take full advantage of its capabilities is shared storage. This is because enabling shared storage brings out the full value of VMware. Key features like VMotion, Distributed Resource Manager and Site Recovery Manager require shared storage as the foundation for their efficient operation. As a result studies indicate that over 70% of VMware infrastructures are on a shared storage network of some kind.
Until the latest release, the only option for shared storage was fibre channel SAN. With the latest release VMware has brought users the option of using iSCSI or NFS along with fibre channel for shared storage. The result has driven down the cost of implementing the shared storage that VMware needs to deliver much of its capability. Lower cost storage options expand the use of the product to enterprises of all sizes.
VMware introduces storage protocol choice
In the past, fibre channel SANs had been the prominent deployment method to create a shared storage environment to complement the virtual infrastructure. But now, VMware 3.x Datastores and their underlying virtual machine disk images use either Virtual Machine File System (VMFS) or Raw Device Mapping (RDM). VMFS is a cluster aware sharable file system, which enables multiple ESX servers to have access to the same LUNS; a critical enabler of capabilities like VMotion and VM-HA.
RDMs were often used for special cases when a virtual machine (VM) needs direct access to a LUN, often to guarantee performance. The RDMs were directly mounted to the VM, partitioned by that VM and formatted with that VM’s operating system.
With the release of VMware 3.x the storage connectivity choices expanded from fibre channel SAN to include iSCSI and NFS. iSCSI, being block based, uses VMFS, while NFS utilizes the file system of the Network Attached Storage (NAS) being used to present the NFS storage. This is an important differentiating factor since the quality of the NAS file system is critical to the success of the implementation. All of these choices have created great confusion over the appropriate protocol to use for a given situation.
VMware Implementations Mature
The first phase of VMware deployment typically starts with a focus on the “low hanging fruit”— consolidating physical servers that do not have very high computing, storage or network I/O demands. In this phase the network and storage complexities are kept at a minimum. Almost any protocol could be selected and the probability of success was fairly high. However, as the complexity of the environment rises, storage management challenges like monitoring and identifying performance bottlenecks and growing VMware Datastores begin to surface.
The next step adds to this complexity and becomes more challenging. For many, this will be the expansion of the virtual infrastructure to greatly increase the count of consolidated servers and to begin the migration of more business critical servers. As mentioned above the physical increase in server count alone will create greater storage management challenges not to mention the need to maintain performance service levels.
As VMware implementation projects reach their second phase, IT Managers began to shift their focus to handling challenges posed by the storage management component of the virtual infrastructure. To facilitate this, VMware now offers choices in storage protocols. Now, IT Managers are beginning to determine the right protocols and the right way to implement those protocols.
Platform Selection
An important component of the protocol selection is the choice of storage hardware platforms. As mentioned earlier, broadly, those choices are Fibre, iSCSI and (new to many) NFS. In 2008 VMware did a test of these protocols and found that while fibre was the performance leader, the application has to be fairly demanding from a storage I/O perspective to take advantage of that bandwidth. For many workloads that need to be virtualized, iSCSI and NFS offered sufficient performance. From a performance standpoint, iSCSI and NFS were a virtual tie in almost every testing category.
This and other tests indicate that protocol selection should be dependent on the workload needs and the experience of the IT Staff. For best results, the right protocol needs to be picked for the right environment. Some will need the capability to mix fibre and one of the IP protocols and for some others that would be overkill. As IT Managers look to deploy new storage solutions or upgrade their current ones it is best to have the option to deploy multiple protocols based on application needs as opposed to a one size fits all approach with a single protocol solution.
In essence, it is best to avoid storage systems that are single protocol oriented. Also avoid systems that are NAS based but simulate the hosting of fibre channel and do not provide native access to the protocol. That is, avoid using IP systems that create a block protocol disk object on their existing file system. This creates extra layers of translation and minimizes the primary reason to implement fibre channel – performance.
Cost containment is another major factor in making storage decisions in this economy. It makes sense to select solutions that can leverage the existing investment both in infrastructure as well as storage, and then scale platform capacity to meet demand. The platform itself should be able to contain costs by extending current capabilities, improving storage management efficiencies and optimizing storage resources.
Most VMware environments are either direct attached or have a shared fibre channel SAN. As an IT Manager considers new solutions, especially one that offers a different protocol, it may require a new stand-alone storage system with its own storage network or infrastructure. While some manufacturers can add iSCSI to their current solutions, many cannot add NFS and iSCSI. The addition of either protocol is challenging and so the suppliers should be examined carefully.
An ideal solution would be a multi-protocol storage appliance that offers solid NAS services for NFS and peacefully coexists with the initial infrastructure. Examples of solutions that offer this type of capability are available from NetApp and Onstor. Both of these organizations offer NAS heads that can be attached to existing storage to lower costs while extending protocol support beyond just fibre channel without requiring the replacement of the current protocol infrastructure.
Protocol Comparison
Fibre
Fibre channel is still the dominant form of deployment today in most VMware environments that are on a shared infrastructure. The option to seamlessly integrate it should be considered a critical component in any plans going forward. Its selection rate and dominance comes from a variety of reasons. Foremost is that, up until the 3.x release of VMware, it was the only protocol supported and even as 3.x became available it was the protocol that implementers were the most comfortable with. It also, by default, is the first to support VMware’s advanced features like Storage VMotion, Site Recovery Manager (SRM) and is the only protocol to support Microsoft Cluster if you want that as a virtual machine.
With fibre channel, a VMFS data store is assigned to an ESX Server. The ESX Server can boot directly from the datastore on that shared storage. The datastore is then subdivided into multiple VMs, which in turn have their own VMDK.
Fibre channel is also the performance leader, especially for high storage I/O workloads. In a test done by VMware entitled “Comparison of Storage Protocol Performance” fibre channel was able to handle a higher I/O workload based primarily on its higher bandwidth (4GB in the test, now 8GB compared to 1GB IP protocols). It also had less impact on ESX host CPU because the storage overhead is offloaded to the Host Bus Adaptor (HBA). As for the drawbacks of using fibre channel protocol, there are three key issues.
The first key issue is that thin provisioning is not available in VMFS as a default and that resizing a datastore is done via extents but is not recommended while the VM is in production. The issue may be overcome depending on the storage system being used or being selected in phase two. A solution that offers thin provisioning and volume expansion may be able to overcome these challenges. Even with more advanced storage systems, resizing a block based VMFS datastore is always challenging and is typically done in a maintenance window to be safe.
The second challenge is that fibre channel SANs are perceived to be more complex than other offerings. Especially if there is no existing fibre channel investment, this could be a valid concern. While certainly not impossible, there is a whole new set of terms and configurations that need to be learned. If there is deep familiarity with IP configuration on staff, this may be the more logical choice.
The final challenge is cost. If there is no fibre channel investment implementing a new fibre infrastructure can be very expensive, certainly more expensive than the IP alternatives due to high cost of infrastructure as well as the learning curve associated with it. Fibre channel will require a specific switch fabric and a special type of HBA to connect to that fabric.
Fibre channel will likely be needed as most virtual infrastructures grow and evolve. The IP based protocols may initially be able to reduce costs of the fibre channel investment by supporting business critical and lower performance servers and reserving fibre channel for the more mission critical, performance demanding servers. Regardless of the selection made initially it is always a good idea to have a storage solution that can natively support fibre in the future.
iSCSI
With the exception of Microsoft Cluster Support, iSCSI shares much of the same capabilities with fibre channel because the data access is block based, except that it runs over an IP based infrastructure as opposed to a fibre based one. The key factor in selecting iSCSI over fibre is typically the ease of use and cost.
iSCSI is perceived to be easier to use because it leverages the existing IP network. IP knowledge can easily be leveraged in iSCSI environments. iSCSI deployments will most often have a private IP network or VLAN just for storage I/O. iSCSI can become challenging when it needs to be optimized and scaled to increase performance. For example, to minimize CPU utilization hardware based HBA’s may be implemented to offload the additional IP overhead from the server. An iSCSI HBA can also help if you need the ESX server to boot directly from shared storage. VMs can boot from the shared storage by using an iSCSI initiator by itself and do not require an HBA.
As the number of VMs and the need for performance grows, the requirement to scale the infrastructure takes away many of the iSCSI’s advantages, including the cost advantage. Dedicated iSCSI HBAs, more intelligent IP switches all cost more money. Additionally configuring iSCSI VLANs and HBAs will require a similar level of effort and detail that a fibre channel environment will. The difference being that with iSCSI you will get to the need for performance tuning sooner than you will with fibre.
iSCSI also being a block based protocol like fibre, shares many of the storage management complexities such as datastore resizing.
NFS
The other IP protocol option and possibly the newest to some is NFS. As is the case with software iSCSI, an ESX server cannot boot from NFS, but the virtual machines can. The virtual machine images (VMDKs) are then created on the NAS file system and are treated just like files. VMDKs are files, hence a platform optimized for files, like NAS, is ideal for managing VMDKs. Since NFS is a shared file system, leveraging VMDK files across multiple ESX servers makes functionality like VMotion significantly easier.
NFS is not a file system by itself but a protocol that connects a server to a file server or NAS. There should be careful consideration of the file system that is being used by the NAS storage provider. A simple server running Linux and NFS is not going to perform well. Instead, what is needed is an appliance that is well optimized for NAS Services.
An additional benefit of using a NAS is that all the advanced capabilities of its file system are inherited. For example, in the case of OnStor, they have the ability to seamlessly move files from one filesystem to another. That same capability can be performed on virtual machine images. For example if you started a virtual machine on a file system that was high speed fiber and later determined that the virtual machine did not need that performance then it could be seamlessly downgraded to SATA based storage.
Another advantage is that NFS provides Thin Provisioning as a default to both the datastore and VMDK – each only consuming the actual disk capacity utilized despite what you tell the virtual machine. In addition the datastores can be resized, both expanded and contracted, as needed. In both cases, this can be done while the systems are in production, unique among the three protocol options. Most advanced NAS systems also have the ability to expand the NFS volume itself on the fly with no down time.
Creation of datastores themselves is also straight forward. Mount the NFS volume to the ESX server and using the Virtual Center GUI start creating datastores and share them between ESX servers.
While building a robust IP storage infrastructure faces some of the same challenges as fibre channel, (advanced IP switches vs. FC switches and VLAN’s vs. LUNs), there is no need to deal with specific HBAs or LUN sharing issues. The filesystem has buil-in sharing capability and there is no need to deal with LUN sharing or identical LUN IDs.
For NFS deployments, all of this adds up to operational efficiency. Virtualization of servers has become popular because of the flexibility that it brings to the data center. The flexibility of NFS mounted VMware datastores are an ideal complement to that effort. However part of that flexibility is the ability to natively deploy fibre channel when and if the need arises.
Monday, January 26, 2009