Solving Corporate IT Storage Challenges With Big Data Infrastructures
Solving Corporate IT Storage Challenges With Big Data Infrastructures
Big Data is often thought of as a problem for the research, engineering, entertainment and Web 2.0 fields but the demand to accommodate the ever increasing information being generated is being faced by even traditional IT in mainstream corporations. That demand falls in the lap of IT to address. If the organization has a history of generating data then, in almost every case, there is value in efficiently storing, managing and protecting that data. At the same time the storage infrastructure required to support Big Data can also solve traditional corporate IT data analytics challenges. In fact, there may be a bigger opportunity to leverage a Big Data storage infrastructure for corporate IT storage needs beyond deploying it for the Big Data needs typically associated with scientists, engineers and animators.
Tuesday, August 23, 2011
Corporate IT data sets are creating challenges similar to Big Data, thanks to large scale server and desktop virtualization, large home directories and file archives. There are now millions if not billions of file objects that need to be stored, managed and protected. Organizations are looking for ways to manage the relentless growth of these data types without having to incur unacceptable hardware, software or administrative costs.
George Crump, Senior Analyst
Isilon Systems is a client of Storage Switzerland
Related Articles
Designing Big Data Storage Infrastructures
Storage Efficiency Is Key For Big Data
Mitigating Risk With Scale-Out Storage
Legacy Storage in the Modern Data Center
VMware Storage Simplification Strategies
The Complexity of VMware Storage Mgmt.
Searching for High Performance Storage
Server Virtualization in Bottlenecking NAS Storage
Solving the Storage I/O Performance Bottleneck
What’s Causing the Storage I/O Bottleneck?
Using NFS for Server Virtualization
The Virtualization Challenge
Server and desktop virtualization have two specific requirements of their storage infrastructures. They have to handle today’s performance demands and scale to meet tomorrow’s. But they will have to scale that performance incrementally as needed, without requiring that all the future performance be purchased upfront.
Virtual environments will become increasingly dense. As that happens more I/O will be originating from each connecting host and will be increasingly random in nature, requiring more IOPS performance. Legacy storage architectures struggle to meet this challenge and the only real solution is to buy all the future performance need upfront. Buying performance upfront is similar to buying storage capacity upfront. Anything bought at today’s prices is a premium vs. waiting a year when the performance is needed and it has become more cost effective.
The Large Home Directory Challenge
File based data is also on the rise, both in the number of objects that it represents and the size in each individual object. Obviously this impacts the required storage capacity but it impacts performance as well. Larger files being requested by more users drives a significant increase in performance demand on the storage system over what was required in the past. How this data is transferred has changed as well. While the large majority is still transferred in bulk an increasing amount, like video and audio files, is now streamed from the network to the requesting client.
Also much of the file based data being generated today is not user created. Machines and devices in many organizations are responsible for much of the need for more capacity. This can come from report servers or specific devices collecting information.
The Retention/Archive Challenge
The majority of this data, whether it be databases or files created by users, now needs to be retained for both regulatory reasons and for further potential mining needs. One downside to understanding the value of a Big Data project is that more data is now likely to be deemed important and retained “just in case”. Now though “just in case” is not just a comfort to the users but also to the organization, since there may be value in that data in the future.
Big Data Infrastructure Answers Modern IT Demands
As a result of virtualization, the increasing importance of user home directories and the desire to simply retain more data, the modern IT data center has developed storage needs similar to that of Big Data environments. Both need storage systems that are simple, scalable, reliable and affordable, and may be ideally served by scale out storage.
Making Big Storage Simple and Scalable
Many storage systems on the market are relatively easy to use, when they’re first installed or when they are the only storage system. As new volumes are created and more capacity added, storage management becomes increasingly difficult. Decisions have to be made about how much to provision to each volume and how many and which types of drives should be used for each volume.
Then, when the storage system reaches its maximum capacity it needs to be upgraded. Since most organizations don’t want to go through the process of transitioning to a new storage system and migrating data to it, they end up buying additional storage systems to keep pace with capacity and performance demands. However, this creates multiple silos or islands of data.
Scale out storage solutions like those offered by Isilon Systems solve this problem by offering a single system that can handle all the data in a single pool that can scale storage and capacity incrementally. Each time a node is added to the cluster additional storage capacity, performance and I/O bandwidth come with it.
A scale out storage system allows the use of a single volume instead of multiple volumes. This eliminates the need for provisioning or drive allocation decisions. As more applications or hosts are added to the infrastructure they simply store their data in the same volume. It is reasonable to expect that each application or host added to the infrastructure will need additional capacity. With scale out storage, as you add more capacity via nodes, more processing power and I/O bandwidth comes with it. With a scale out storage architecture you do not need to perform a forklift upgrade or bring a new storage system in for a specific purpose. The scale out grid should be able to handle all the workloads.
A key is for scale out architectures like Isilon’s to have the ability to move data between storage pools. Pools are collections of storage typically grouped together by tier. For example there can be a high speed pool, a standard pool and an archive pool. These pools are managed in the background based on storage administration policies. From a day to day perspective the administrator still interacts with a single volume. The storage system moves the data based on policy and access profile.
Making Big Storage Reliable
Reliability of the storage system is important, especially in virtualized environments since so many server instances from a single host are counting on the shared storage system to be operational. Where most storage systems try to remove single points of failure, scale out storage systems have multiple points of redundancy and avoid the dependency on traditional approaches like RAID.
This means that a scale out storage system can survive multiple drive and node failures without losing data or data access. The level of redundancy and availability is administrator selectable by workload type. Virtualized images for example can be set to survive the highest level of failures where archives may be set to a basic level of protection.
Making Big Storage Affordable
While some IT budgets are rising, most are flat. This means that the storage manager must squeeze maximum levels of efficiency out of every storage dollar spent, and the upfront cost is part of this equation. Scale out storage shines because it allows IT to only buy the performance and capacity that is needed without the threat of a future forklift upgrade.
A significant percentage of storage dollars are wasted in low utilization levels. With the siloed, multi-volume approach significant capacity is always lost just in overhead. This is where a capability like having a single volume for all storage needs can be especially valuable. In addition capabilities like snapshots, thin provisioning and the ability to set quotas on utilization allow storage capacity to be allocated only as it is needed by the user or the application.
Summary
Big Data storage problems and modern corporate IT storage challenges have a lot in common. It makes sense that the architecture that’s ideal for the Big Data environment may also be the most appropriate for corporate IT. Scale out storage provides the data management capabilities that organizations need while keeping management time and costs to a minimum. Finally, organizations of course always have an IT data storage need, when there is a Big Data need as well, all of these workloads can run on a single system, further reducing costs and optimizing storage for the entire data center and dramatically reducing costs and complexity.