The first frame of the argument in cloud definitions is that it must have this feature or that feature. What constitutes a ‘necessary’ feature is in the eye of the beholder. What is important to one organization may not be important to another. The core underlying capabilities that must be part of cloud storage is a just in time growth model and an ‘instinctive’ storage management capability.


Users looking for cloud storage ideally want to treat storage as a just in time inventory item. Pay-as-you-grow is important for both the provider and for the consumer. The provider wants to offer cost effective incremental storage, and provision the storage only when it is required, the consumer wants to only pay for the storage they are using.


Instinctive storage management capabilities are also critical. The instinctive nature of true cloud storage is more than just automation. It is self healing, self configuring,and  self managing. As the scale and capacity of these systems grow, wizards (essentially GUI accelerators) don’t provide the administrative relief that users are looking for.


This instinctive storage management will likely require policy driven settings that are embedded with the data at the point of creation via custom metadata policies. For example, companies like ParaScale allow the setting of policies that dictate what class of storage certain data types should be stored on; when, based on file attributes that data should be moved to a different storage tier; and how that data should be protected and replicated, based on use or popularity. The critical piece of the puzzle and what makes this management instinctive is that these behaviors can be set well in advance of the data ever being created. Then, the policies follow the orders as the data is created, modified and aged. Instinctive management requires policy management but it may also requires an architecture that separates performance and capacity. Solutions that require a gateway are not completely capable of instinctive management because adding capacity can create an access bottleneck on the gateway.


Every other capability that cloud storage systems can offer is conditional based on what problem needs to be solved. Broadly, there are two use cases. The first is focused on infinitely scalable capacity and the second, on infinitely scalable performance. It’s important to note that cloud storage performance is typically focused on aggregate I/O, in other words access from many clients across the entire cloud. This differs from clustered file systems where performance is focused on multiple access to a single file. Both are focused on delivering those capabilities very cost effectively. This article will focus on what to look for in cloud storage from the aspect of the more commonly sought after use case; scalable capacity.


Multi-tenancy is going to be a high priority if the goal of the organization is to provide storage as a service or if the organization is going to use storage as a service. There should be a native capability to ensure that each customer's data is secure and protected from not only other customers’ data but possibly, even the provider. If the service is going to be used only internally, also known as a private cloud, then this may be less critical, but certainly still useful to segment data by department, user group, etc and establish different policies.


For the capacity focused provider or consumer, the ability to have the service scale out in capacity is another important consideration. How this scaling happens is the subject of much debate, but for those looking to use cloud storage to solve problems there are a few key considerations.


Scale out storage can occur via a tightly coupled cluster where all the nodes are dependent on each other. These systems may have a performance advantage, as access speed is the result of the combined nodes, but typically have limitations on the quantity of nodes supported. Usually, the capacity scale of these solutions is sufficient for internal, more performance-centric data center solutions but for the cloud, especially for providers where capacity requirements may be truly unlimited, this limitation may be an issue.


A loosely coupled cluster, on the other hand, typically does not have the same quantity of node limitation - or that number is significantly ahead of today's requirement. ParaScale for example, has a virtually unlimited number of nodes that can be members of its cluster. This is because each node in the cluster acts as its own independent entity all managed by a centralized, policy based, global file system.


Loosely coupled clusters may also better enable the next consideration, a software based cloud storage application that can leverage commodity hardware as part of the storage cluster and the storage itself. This capability allows the provider to acquire their own servers to be cluster nodes as well as their own disk storage to put in those nodes. The importance of driving down the monthly cost of storage, sets the weight on how important this capability is. The above-mentioned custom metadata can also be leveraged to offset reliability concerns with using commodity hardware.


Many organizations looking to solve a capacity problem will look at cloud storage's cost per GB per month as a key differentiator in provider selection. A provider that can use commodity hardware and still keep data availability ahead of their customers’ requirements could have an advantage over those that use a turnkey solution provided by major storage manufacturers.


Another important advantage that loosely coupled clusters more often enjoy is the ability to perform rolling upgrades to new platforms as they become available. This is more critical in cloud storage than possibly any other storage application because the intent may be to keep the data in the cloud indefinitely. Cloud storage platforms are already being used to retain information, like email archives and medical records. This type of data has, in many cases, legal requirements for retention that can last for years, if not decades.


Migration from one platform to another may simply be impossible with cloud storage. The impact on potentially thousands of customers moving 100's of petabytes could be overwhelming. The rolling architecture that a loosely coupled cloud storage platform can provide allows for nodes to be of different types, manufacturers and models. When it is time to replace an aging node, a new node based on the latest hardware and storage available can be added. Then, with a simple command, just the data on the old node can be moved to the new node and the old node can be removed and disposed of. Over time, the entire cluster can be upgraded, becoming faster with each incremental upgrade.


A consideration, especially from a provider’s perspective, is to look for web services that allow for metering, monitoring and integration with billing systems. If as a provider, you are going to bill by storage capacity used, it is ideal to have your cloud storage software help automate that process. 


A final consideration may be to look for a solution that will allow native applications to be run on the nodes themselves. For example, ParaScale will allow a backup software agent to be placed directly on each node in the cluster, allowing for local backup of the member nodes and eliminating the need to backup all that data across a network.


While the debate rages on about what cloud storage is, storage managers and providers have a job to do. These capabilities are what to look for in cloud storage. Knowing what the goal is, provider or consumer, performance or capacity, are the critical first steps in selecting the right cloud storage solution.

George Crump, Senior Analyst

This Article Sponsored by ParaScale