In Open Storage The Storage Infrastructure Matters
In Open Storage The Storage Infrastructure Matters
Storage systems have rapidly evolved to become more open and modular. Storage data services and features that used to only be available on proprietary hardware platforms are now available through software, either as a file system or in applications themselves. These software solutions can provide all the features that IT expects like access, volume management, thin provisioning, and snapshots, but provides it agnostically across storage hardware.
Cloud providers and private data centers are embracing this change and taking advantage of the flexibility that software based storage services provide. The reason for this increased interest in adoption is twofold. First, the underlying physical server powered by Intel processors can deliver enough CPU cycles and network I/O so that proprietary hardware is no longer needed. The second reason is that moving storage services closer to applications can make them easier to manage and more cost effective.
This filesystem- or application-centric approach is becoming commonplace for a wide variety of data sets. As a result the modern day file system abstracts storage services from the underlying block layer. Products like ZFS, GPFS, GFS, MogileFS, Lustre, Glustre and Caringo present a file system that becomes middleware to the block storage underneath. The impact to storage administrators and application owners is that they now have the ability to pick the best-of-breed file system for their specific use case or workload without being constrained by the underlying storage infrastructure. In order to realize the full potential of open storage architectures, the underlying block storage needs to be simple, fast and cost effective.
The Scalability of Open Storage
Most open storage software solutions run on traditional server class hardware, often called “nodes”. Initially the storage capacity is provided by internal storage in each node. These solutions are typically scale-out in design, meaning that a group of nodes running the scale-out software are clustered together and appear to the application as a single entity and a single storage pool. But as with most things in IT, when the environment scales cost effectiveness is eroded to meet performance and scaling demands.
Scaling Open Storage Software Architectures
Performance of the cluster can be increased by expanding the number of nodes and capacity can be increased by adding more local storage to each node or additional nodes. Adding nodes to an open storage cluster is relatively easy but, as capacity and performance demands escalate, the environment soon has to deal with another issue.
“Node sprawl” occurs when nodes are added to the cluster prematurely to meet a storage performance or storage capacity demand. This is a problem because the physical hardware that contains and delivers the data is now more expensive than the devices that hold that data. In this situation there is still plenty of CPU processing power available per node but the I/O or capacity constraint that leads to a bottleneck still exists. Also, each additional device consumes available floor space, a situation that’s becoming a major issue in most data centers.
Eventually the cost of node sprawl becomes too great and an alternative must be found to maximize the capability of each node. Providers and data center managers have begun to look at shared storage architectures that can increase per-node storage capacity and performance. The key is to find an architecture that will maintain the benefits of these open storage designs while overcoming the node sprawl challenge.
Comparing Storage Architecture Choices
There are two widely used networking technologies used for shared storage, fibre channel and Ethernet. Though fibre channel offers top-notch performance, many providers and data center managers will quickly rule it out, because a fibre channel architecture would increase costs significantly by requiring that an additional (and different) connectivity architecture be placed along side the IP architecture needed for intern-node communication.
Fibre channel may also increase training and tuning costs. There may not be fibre channel expertise on-staff to deal with any challenges that may arise. For example, scaling fibre channel performance involves installing multiple cards and manually trunking those cards in order to maximize throughput.
Fibre also has to be deployed within a relatively short distance to the attached servers. It is not uncommon, especially in the cloud provider market, again because of floor space issues, to have storage hardware in an entirely different location from the server hardware.
The other option is Ethernet based storage protocols. There are typically two to consider, iSCSI and AoE (ATA over Ethernet). Both have the advantage of not requiring a new physical cabling architecture. They can leverage what is already in place, helping to keep hard acquisition costs down.
iSCSI is a block storage protocol based on SCSI and can be provisioned to each node in the cluster. The problem with iSCSI is that each SCSI packet needs to be encapsulated within the IP protocol in order to be transmitted across the IP network. This adds processing overhead which introduces latency and reduces performance. This overhead becomes especially evident in a large scale-out architecture. Scaling iSCSI performance can become difficult, as cards have to be manually provisioned and trunked in order to be used.
AoE is unique in that it leverages Ethernet cabling but not IP. It natively transmits the ATA protocol over Ethernet cable segments. Its architecture is essentially shared DAS, which is ideal for these scalable, open storage solutions. Without the overhead of IP it can achieve similar performance to that of fibre channel. Scaling AoE performance is also easy as the protocol automatically "port-floods", meaning that all available ports are used automatically to transmit data.
Because AoE presents volumes as local SCSI drives it also brings an operational simplicity that’s lacking in other protocols; direct attached storage is certainly well understood and easy to interact with. AoE maintains that simplicity but adds sharing via an Ethernet environment.
As stated above one of the advantages to open storage software is that this best-of-breed solution can be selected on a use-case by use-case basis. This brings the potential for multiple file systems to be present in the environment. And, the sharing aspect of AoE allows it to be a common backend that can support multiple open storage software deployments.
Finally, like the other Ethernet protocols, AoE does not constrain the geographic location of the storage device. The nodes of a cluster could be in a different building from the storage yet still have full access. Assuming a low latency connection, the use of Ethernet allows this concept to be extended so that nodes can be in different cities allowing organizations to create metro-clusters.
Conclusion
Open storage software brings significant flexibility to the data center and cloud provider. The ability to select storage software decoupled from hardware on a per-application basis promises to lower costs and increase IT responsiveness. However, as that environment grows keeping node sprawl under control becomes a key challenge to maintaining the cost effectiveness of the open storage solution.
Node sprawl can be controlled as long as the right architecture is selected but that architecture has to maintain the flexibility and cost effectiveness of the original project. AoE is an excellent example of an architecture that can leverage the existing infrastructure and IT knowledge base while increasing per node performance.
Coraid is a client of Storage Switzerland
Previous Entry: “Network Caching Needs Scale and High Availability”
Wednesday, July 25, 2012
George Crump, Senior Analyst