Clustered NAS Architectures And The Cloud
Clustered NAS Architectures And The Cloud
The demands that a cloud service places on storage can typically be divided into two camps. On one side, performance is a critical requirement, especially for cloud compute architectures. On the other side, cost-effective storage is critical, typically required for cloud storage architectures. Both camps need the ability to scale to support thousands of users and PBs of capacity. Clustered NAS architectures and the cloud have become synonymous because of clustering’s ability to provide that scale. There are two clustering methods available, tightly couple clusters and loosely coupled clusters, and each has a role to play in cloud storage.
Thursday, January 14, 2010
Loosely coupled clusters typically attempt to fill the needs of the cloud storage focused camp. In these clusters each node is a stand-alone entity, with volumes assigned to it and not accessible to other nodes in the cluster. The smallest atomic unit is the file and it’s contained on that node. When a file is stored to the cluster its data is stored intact on one specific node in the cluster. While multiple copies of the file can be managed and redirected to other nodes for redundancy, multiple nodes cannot help in serving up a single instance of the file. This is acceptable for some cloud storage applications.
The smallest atomic unit in a tightly coupled cluster is a block of data. As files are stored to the cluster they’re broken down into blocks, which are accessible by any node in the cluster. When a file request is made each node participates in accessing different blocks of the file and serving it to the requesting application or user. The more nodes available to respond to requests, the faster that performance can be.
In both cluster methods the application or user is dealing with what appears to be a single entity. They don’t need to be aware of the multitude of nodes that lay beneath the mount point. However, to the storage manager there is some complexity in a loosely coupled cluster, as they do need to manage policies that distribute the data and make sure access to the different nodes is relatively balanced. In a tightly coupled cluster there is one access point because all the nodes participate in file access equally.
The performance aspect of the cloud is one that’s often overlooked. After all, data is being accessed via a relatively slow internet connection. The processing of the data, especially in cloud storage, is done by the accessing user or application. While there are some performance demands from potentially thousands of users accessing data, the performance of individual nodes is typically acceptable. There are two cases where storage performance matters significantly in the a cloud infrastructure.
The first is the cloud compute or application-as-a-service environment. In cloud computing thousands of users are typically using a web front-end to an application. All requests are received and processed internally within the cloud compute provider’s data center. This is roughly the equivalent of thousands of internal users accessing and making demands of the application and it requires a similar type of design.
The second use case is for data centers that want to leverage the scalability of cloud storage architectures for internal deployment. As in the case above, with potentially thousands of users accessing the storage cluster simultaneously, it’s all internal and if high performance NAS services are required, the performance capabilities of the storage becomes critical.
The challenge is that the compute provider cannot use traditional approaches to managing storage performance and scalability demands when providing a non-traditional form of that environment. In the traditional method everything is essentially pre-paid or bought upfront. The capital expense is then depreciated over the next five years.
Clouds typically provide a pay-as-you-go model. Most cloud compute environments don’t start out with thousands of users who’ve all prepaid for the use of the service. Each typically pays monthly or quarterly, or sometimes annually. As a result, cloud providers can’t afford to lease a large storage infrastructure upfront and hope that enough customers will sign up for them to pay their bills. They also have to be aware of the opposite problem, that their application or service is so successful that they quickly outstrip the capabilities of their initial purchase and then need to upgrade to a faster, more expensive unit.
Cloud providers need a model that matches their business model, one that can scale as their service scales. The cloud storage market has quickly adopted loosely coupled storage clusters, either through homegrown ingenuity or with off-the-shelf software, to provide them with this flexibility. In comparison, Cloud Compute and Application Service providers have moved to tightly coupled storage clusters to provide similar scaling of both performance and capacity, but they’ve done so with some compromises.
First, in most tightly coupled clusters, all the components of the storage system are provided by a single vendor. The internode communications in this architecture require that the hardware be much more similar. Also, typically tightly coupled clusters struggle with managing multiple classes of storage within the single cluster. Many can offer only one class of storage (all fibre, all SATA, all SSD) per cluster.
Companies like Symantec with their FileStore product are beginning to combine the flexibility of loosely coupled clusters with the performance and management simplicity of a tightly coupled cluster. These solutions can create a storage cluster by using software that can be loaded on off-the-shelf Intel-based servers, of mixed capabilities, to provide the front-end storage processing muscle to a shared storage back-end. The storage also can be from multiple vendors, segregated by pools. Data can then be migrated between the pools of storage automatically based on file attributes, like last access.
When it comes to backup of the large storage repositories many loosely coupled clusters can be protected via NDMP. Some are advancing to add integration with data protection software. Since many of these clusters run a form of Linux it may require just installing a Linux backup agent on them, which is more of a certification than it is integration. Symantec has gone a step further by providing tight integration with their NetBackup enterprise backup solution in addition to standard NDMP support. This can greatly improve performance and further simplify data protection operations.
This provides the cloud compute environment the flexibility of using commodity Intel hardware, yet still garner the performance of tightly coupled clusters from what was traditionally a single-vendor offering. It’s also ideal for data centers looking to leverage private clouds to provide internal file storage. Now the performance characteristics of tightly coupled clusters can be leveraged as well as existing storage resources to provide users with a high performance NAS platform, but one that is based on cloud economics.
George Crump, Senior Analyst
This Article Sponsored by Symantec