What Is Performance Sprawl?
What Is Performance Sprawl?
Improving storage performance is becoming a top priority for IT Managers in data centers of all sizes. Storage performance problems are often not isolated to a single server yet most vendors treat storage performance resolution as a point problem to be fixed by dedicated hardware installed inside the individual server or the storage system. Treating performance problems in isolation like this leads to a condition called performance sprawl.
Performance sprawl is becoming commonplace in data centers. Typically, each application with a performance issue is treated separately, and as a result, specific performance remedies are purchased to address each issue. For example, a PCIe Solid State Device could be added to an underperforming virtualization server, but that device provides no benefit to a separate underperforming database server.
Even a shared solid state appliance may have limited value. These appliances often have limited storage management features and rely on the application to provide data protection and high availability. While some database applications and operating systems can accommodate this need, many others, with server hypervisors being a prime example, cannot. They are dependent on the storage system to provide the high availability and data protection.
The alternative of adding solid state devices to an existing shared storage system with more robust storage management features may have limited value as it introduces its own form of performance sprawl. With this option, solid state disk drives (SSD) are added to an existing storage system alongside mechanical drives. The vendor typically provides tiering software to move active data to the SSD tier. The limitation is this performance fix is restricted to the storage system on which it is installed meaning that each storage system needs its own SSDs, in essence another form of sprawl.
The point product approach often solves the immediate problem but creates a bigger one. Adding multiple performance-enhancing devices can end up costing a lot more than a single, more holistic performance solution.
The Costs of Performance Sprawl
Capital Waste
The single biggest cost of performance sprawl is the waste of IT budget. Since most storage performance-enhancing solutions heavily leverage solid state storage to deliver their performance boost, they demand a premium price. Replicating these purchases across all application servers and/or storage systems can be prohibitively expensive. In addition, most storage system capacity is significantly under-utilized in order to maintain acceptable performance. Without a means to free up storage controller CPU cycles, this unused storage capacity represents a huge waste of capital.
Time Waste
IT staffs are already stretched to the breaking point. They need ways to reduce the time spent maintaining systems so they can focus on more strategic projects. However, as we all know, data center professionals spend a high percentage of their time reacting to user complaints. Storage performance tuning becomes an interrupt driven process for the IT administrator and typically, a number of time consuming remedies are attempted before a move to SSD is made.
IT administrators also need to monitor multiple performance silos. While point remedies can solve the immediate performance problem they need to be managed and monitored on an ongoing basis to make sure that they continue to be effective. For example, trending has to be performed on each performance silo to predict the need for the next upgrade.
While most storage performance-enhancing solutions provide some type of reporting on their effectiveness, each one must be monitored separately since they cannot monitor performance across multiple silos. Finally, point solutions cannot provide the required visibility into the storage performance of applications running on other servers. As a result, administrators are unable to take proactive measures to prevent these applications from becoming the next in line for performance problems.
Resource Waste
As discussed above, most storage performance-enhancing solutions leverage SSD technology. These solutions are typically dedicated to a single application. In reality most applications cannot take full advantage of a dedicated solid state device’s performance on a continuous basis. As a result, the premium investment is not fully utilized at all times.
As previously mentioned, there may be applications or servers, especially in a virtualized environment, that are on the verge of having a performance problem. This means they could benefit from a performance boost (and users would also benefit) but that boost cannot be cost justified. In aggregate, these servers justify a storage performance acceleration investment but it would have to be a shared resource leveraged across them.
Holistic Performance Resolution with Network Caching
An alternative to the point solutions that lead to performance sprawl is a holistic approach used by network caching products like those offered by Cache IQ. These appliances are installed in the network, between the application servers and the storage systems, and provide a performance boost to all the servers and storage systems that are network connected to the devices.
Capital investment is preserved since the network caching appliance can be used throughout the data center and across multiple storage systems. Management is eased since now there is a single interface to monitor storage performance. Resources are better optimized since the network cache can provide performance boosts to all the connected servers and storage, not just the few that can cost justify a solution individually.
A shared storage performance resource needs specific capabilities though to manage a broad range of workloads. It must deliver a cross-workload view of storage performance.
Requirement # 1: Analysis
As we detailed in our article, “The Added ROI of Advanced Network Caches”, analysis of the environment becomes an important value add of a network cache. The ability to monitor at a packet level the data flowing through the cache allows storage administrators to visualize performance. This allows them to confirm that applications are getting the performance they need as well as to predict which applications will soon push the current cache to its limits, creating a need for an upgrade. This allows cache upgrades to be planned and budgeted.
Requirement # 2: Tuning
The caching system should be able to optimize or guarantee the performance of certain applications. For example, it should enable storage administrators to prioritize important applications and data over casual use. It may be important to “pin” certain applications’ data in the cache so it is always accelerated. Conversely, there may be some applications that are simply not cache suitable and should never be cached. Tuning policies allow the storage administrator to optimize cache space for the data that is most important to the organization.
Requirement # 3: Scale and High Availability
Since the network cache could potentially serve the entire data center, it should scale in capacity and network I/O as data center performance needs grow. Once the environment becomes dependent on the acceleration that a network cache delivers then high availability (HA) also becomes critical. A scale-out design enables this. Additionally the HA implementation must be active-active so that premium priced solid state storage space does not sit idle, waiting for a failure to occur.
Cache IQ offers a network cache solution that can scale to 8 nodes, each one generating over 1,000,000 IOPS in performance. The system offers full redundancy, so if one node fails the other nodes assume its workload.
Summary
Performance is a big concern for the data center and unlike in years past there is a technology that can solve many storage performance problems; solid state storage. Unfortunately, using SSDs to solve storage performance problems one application at a time leads to performance sprawl, and this can diminish the ROI from those solutions.
Holistic solutions like the one from Cache IQ promise to solve the immediate storage performance problems and increase the performance of applications that typically would not justify an SSD investment. However, it is important that any solution not only accelerates performance, but also provides analysis and tuning of the data being accelerated, and meets the demands for scaling and high availability as the data center continues to grow.
Cache IQ is a client of Storage Switzerland
Previous Entry: “Does Cloud Backup Need An Appliance?”
Thursday, June 21, 2012
George Crump, Senior Analyst