Who is Arming Cloud Storage?
Who is Arming Cloud Storage?
The definitions of Cloud Computing and Cloud Storage have been numerous and trying to pin cloud services down to a single meaning has been an exercise in futility. What cloud services are is dependent on how they will be used and who will be using those services. Whose arming cloud storage’s back-end storage infrastructure will be equally dependent on those variables.
There are two basic cloud models to consider: cloud compute and cloud storage. In cloud compute, a specific application will be shared by thousands of users. These applications can vary from financial services to online role playing games. Although the requests are being made via the cloud, show me all my prospects in New York for example, all the processing of those requests is done within the data center of the provider. In almost every case they will require high performance storage to respond to that local processing.
Tuesday, November 24, 2009
With cloud storage, while performance is still a factor it is not as important as other capabilities. Cloud storage providers need the ability to provide a better storage service at a lower cost, than what the local IT team could build locally. The value proposition in cloud storage to this type of user is only paying for the storage they need at the moment they need it. In other words, the cost of their storage should scale granularly, both up and down, as they consume or not consume that storage space. Additionally, they have to add their own value to that storage like search or archive services.
An additional consideration is how the storage is treated from a financial perspective. Internal development and deployment would be capitalized. In contrast an external approach would be an expense item and treated as such. CAPEX vs. OPEX may provide addition benefits for an organization and should be a consideration.
In this use case, the providers of storage infrastructure become critical. Providers have responded with two options. The first is to design their own storage infrastructure internally. Essentially, they become software developers designing their own storage solution, one that typically runs on white box servers with inexpensive disk attached. While they may be able to claim some advantages because of vertical integration, there is concern about a single company’s ability to do it all. In addition to maintaining the software development around the storage solution, these companies must also provide services that attract and keep users of those services. This includes maintaining a facility, marketing, the software front-end to the services as well as the not to be underestimated task of managing storage for potentially thousands of customers. Insisting that a single organization provide all of these services may be too much to ask.
The other option is for the provider to leverage existing off-the-shelf storage solutions. This allows the provider to focus on the business and their unique value-add instead of also becoming experts in storage software development. For these providers, what they need to look for is a storage system that can scale to meet customer demands, keep them price competitive and keep management simple. Grid based storage archive infrastructures like those offered by Permabit Technology seem ideal for this instance. The can also be coupled with solutions from companies like Mezeo to provide, web services API based solutions. The combination provides a complete multi-tenant solution that allows providers to quickly come to market with robust solutions and allows them to focus on their key value-add.
Scale is potentially the most important aspect of a cloud storage provider’s storage infrastructure, not only in terms of how large the storage system can scale but also, how easy it is to manage that scale. The provider’s profit has to come from being able to provide that scale quicker and with reduced management costs than what the potential customer can do themselves. Ideally, the storage system that the partner selects will be available to purchase in a similar pay–as-they-grow fashion as the provider is offering to their customers.
An answer for this type of scaling can be found in grid based storage archive architectures. Capacity is added to the storage environment via nodes in a cluster. Each time a node is added for capacity, additional storage processing power and storage I/O bandwidth comes with it. The challenge with traditional systems is that as capacity is added to the system these other components do not grow with the capacity. This leads to two problems. First, the provider is forced to overbuy on storage bandwidth and storage processing initially, to provide room for some scaling. This is a problem because if the provider had the option to wait, that processing power and storage I/O bandwidth would become less expensive in the future. The second problem is that at some point, as the capacity scales in the traditional system, it outgrows the other capabilities of the unit. The customer is left with accepting ever decaying performance or the need to upgrade or add a new system.
Upgrading is problematic, although some systems can be upgraded without having to perform a migration. The storage controllers or NAS heads are simply replaced. This is of course expensive, especially if the environment experiences the rapid growth that most cloud storage providers expect. The second option, adding an additional system, has its own challenges as now the provider must manage multiple storage systems. Although global file systems claim to address this, they only help the data path, not the management path. Compared to independent NAS heads that are coupled together, a single cluster has a single point for all storage management operations like provisioning and data protection.
After being able to meet the scaling demands of the cloud storage provider, the second biggest issue is keeping the cost in check. The cloud storage provider will constantly be compared to the cost of keeping that storage in-house. The cloud storage service must be able to keep hard costs in check enough that the value-add they bring will outweigh any price differential.
The first cost is keeping the hidden costs of scaling in check, as we discussed above, and beyond that it is important to make sure the provider can keep the hard costs in check as well. The cloud storage system should be able to reliably leverage high capacity drives and then add deduplication and compression technologies to optimize capacity utilization. In fact, depending on the service that the provider offers, compression may be the most important as there may be limited opportunity for deduplication. By definition, for deduplication to be effective, there must be duplicate data. Compression provides optimization across all data sets and is not dependent on duplication of data. The ability to apply both space optimization techniques provides the ultimate in space reduction.
Security is of top concern for most new cloud storage users. The cloud storage provider should make sure that some form of encryption or other data security is provided on the transfer from the client to the facility. The storage itself should be designed to allow for encryption without performance degradation. Also, it should be designed to handle the multi-tenant environment that cloud storage is built on. Finally, in compliant environments, it should have the ability to enable write once read many (WORM) storage to meet regulatory needs. Again, this WORM has to be multi-tenant aware as some of the providers clients may need the capability and others may not. Having to deploy separate systems for each requirement may break the cost model.
The final key consideration, which may actually be the most important, is reliability. Losing customer data is not an option and there may be penalties associated if it does occur. If the storage system leverages SATA based technology to keep costs down then it becomes more critical. Standard RAID may not suffice for the storage provider, but a complete mirror is too costly to deploy and may break the price model. The challenge with standard RAID technologies, even the newer RAID 6, is that as the capacity of the drive continues to increase the likelihood of an error increases as does the time to rebuild that drive. Data protection technologies like Permabit’s RAIN-EC can scale massively and provides more complete data protection while leveraging the storage cluster to reduce the amount of time required to rebuild after a drive failure and protect against read failures typical of RAID environments.
In addition, the storage system should be able to replicate data from multiple sites and when doing so, leverage the deduplication and compression mentioned above to keep bandwidth costs down. Cloud storage is abstracted from the user and they will be far less tolerant of a data center failure than an internal user would be. Also, because of this abstraction, the ability for them to switch services is a reality that the provider has to recognize. Data center outages are no more acceptable than simple data loss in the high expectation cloud storage environment.
The rapidly growing and changing nature of cloud storage means that providers need to be focused on what they do best and users need to be cognizant to how much of the stack their provider is responsible for. While the vertically integrated approach may sound good, it may not be realistic. A more scalable business model may be for the provider to focus on the user facing software, the facilities and then leverage someone else to provide the back-end storage infrastructure.
George Crump, Senior Analyst
Related Articles
Faster Primary Storage with Data Dedupe
Primary Storage Deduplication, Demand It
Dedupe Improves Primary Storage Efficiency
SMB NAS is Deduplication's Next Step
Primary Storage Dedupe Addresses Data Gap
How Should Primary Storage Be Delivered
Storage Industry Consolidation & Dedupe
Primary Storage: Dedupe vs. Compression
Making Primary Storage Dedupe Safe
High Performance Primary Storage Dedupe
Automated Tiering or Disk Archiving?
Global Healthcare Leader - Disk Archive
Optimization - New Normal in Storage
Can’t Deduplicate Admin Workload
Managing VM Sprawl - Disk Archive
The Foundation of Dedupe’s Era
Weaknesses of Dedupe - Retention
This Article Sponsored by Permabit