Corporate IT data sets are creating challenges similar to Big Data, thanks to large scale server and desktop virtualization, large home directories and file archives. There are now millions if not billions of file objects that need to be stored, managed and protected. Organizations are looking for ways to manage the relentless growth of these data types without having to incur unacceptable hardware, software or administrative costs.

George Crump, Senior Analyst

Isilon Systems is a client of Storage Switzerland

The Virtualization Challenge


Server and desktop virtualization have two specific requirements of their storage infrastructures. They have to handle today’s performance demands and scale to meet tomorrow’s. But they will have to scale that performance incrementally as needed, without requiring that all the future performance be purchased upfront.

Virtual environments will become increasingly dense. As that happens more I/O will be originating from each connecting host and will be increasingly random in nature, requiring more IOPS performance. Legacy storage architectures struggle to meet this challenge and the only real solution is to buy all the future performance need upfront. Buying performance upfront is similar to buying storage capacity upfront. Anything bought at today’s prices is a premium vs. waiting a year when the performance is needed and it has become more cost effective.

The Large Home Directory Challenge


File based data is also on the rise, both in the number of objects that it represents and the size in each individual object. Obviously this impacts the required storage capacity but it impacts performance as well. Larger files being requested by more users drives a significant increase in performance demand on the storage system over what was required in the past. How this data is transferred has changed as well. While the large majority is still transferred in bulk an increasing amount, like video and audio files, is now streamed from the network to the requesting client.


Also much of the file based data being generated today is not user created. Machines and devices in many organizations are responsible for much of the need for more capacity. This can come from report servers or specific devices collecting information.



The Retention/Archive Challenge


The majority of this data, whether it be databases or files created by users, now needs to be retained for both regulatory reasons and for further potential mining needs. One downside to understanding the value of a Big Data project is that more data is now likely to be deemed important and retained “just in case”. Now though “just in case” is not just a comfort to the users but also to the organization, since there may be value in that data in the future.



Big Data Infrastructure Answers Modern IT Demands


As a result of virtualization, the increasing importance of user home directories and the desire to simply retain more data, the modern IT data center has developed storage needs similar to that of Big Data environments. Both need storage systems that are simple, scalable, reliable and affordable, and may be ideally served by scale out storage.



Making Big Storage Simple and Scalable


Many storage systems on the market are relatively easy to use, when they’re first installed or when they are the only storage system. As new volumes are created and more capacity added, storage management becomes increasingly difficult. Decisions have to be made about how much to provision to each volume and how many and which types of drives should be used for each volume.


Then, when the storage system reaches its maximum capacity it needs to be upgraded. Since most organizations don’t want to go through the process of transitioning to a new storage system and migrating data to it, they end up buying additional storage systems  to keep pace with capacity and performance demands. However, this creates multiple silos or islands of data.


Scale out storage solutions like those offered by Isilon Systems solve this problem by offering a single system that can handle all the data in a single pool that can scale storage and capacity incrementally. Each time a node is added to the cluster additional storage capacity, performance and I/O bandwidth come with it.


A scale out storage system allows the use of a single volume instead of multiple volumes. This eliminates the need for provisioning or drive allocation decisions. As more   applications or hosts are added to the infrastructure they simply store their data in the same volume. It is reasonable to expect that each application or host added to the infrastructure will need additional capacity. With scale out storage, as you add more capacity via nodes, more processing power and I/O bandwidth comes with it. With a scale out storage architecture you do not need to perform a forklift upgrade or bring a new storage system in for a specific purpose. The scale out grid should be able to handle all the workloads.


A key is for scale out architectures like Isilon’s to have the ability to move data between storage pools. Pools are collections of storage typically grouped together by tier. For example there can be a high speed pool, a standard pool and an archive pool. These pools are managed in the background based on storage administration policies. From a day to day perspective the administrator still interacts with a single volume. The storage system moves the data based on policy and access profile.



Making Big Storage Reliable


Reliability of the storage system is important, especially in virtualized environments since so many server instances from a single host are counting on the shared storage system to be operational. Where most storage systems try to remove single points of failure, scale out storage systems have multiple points of redundancy and avoid the dependency on traditional approaches like RAID.


This means that a scale out storage system can survive multiple drive and node failures without losing data or data access. The level of redundancy and availability is administrator selectable by workload type. Virtualized images for example can be set to survive the highest level of failures where archives may be set to a basic level of protection.



Making Big Storage Affordable


While some IT budgets are rising, most are flat. This means that the storage manager must squeeze maximum levels of efficiency out of every storage dollar spent, and the upfront cost is part of this equation. Scale out storage shines because it allows IT to only buy the performance and capacity that is needed without the threat of a future forklift upgrade.


A significant percentage of storage dollars are wasted in low utilization levels. With the siloed, multi-volume approach significant capacity is always lost just in overhead. This is where a capability like having a single volume for all storage needs can be especially valuable. In addition capabilities like snapshots, thin provisioning and the ability to set quotas on utilization allow storage capacity to be allocated only as it is needed by the user or the application.



Summary


Big Data storage problems and modern corporate IT storage challenges have a lot in common. It makes sense that the architecture that’s ideal for the Big Data environment may also be the most appropriate for corporate IT. Scale out storage provides the data management capabilities that organizations need while keeping management time and costs to a minimum. Finally, organizations of course always have an IT data storage need, when there is a Big Data need as well, all of these workloads can run on a single system, further reducing costs and optimizing storage for the entire data center and dramatically reducing costs and complexity.