While performance is not a static thing that can be inventoried like a storage array or a disk drive, it is something that can be quantified. The first step in performance tuning is to inventory what the potential performance capabilities are and the hosts or applications that are the consumers of that performance. This can be done by using a storage management tool like SolarWinds Storage Manager, Powered by Profiler, to understand who those top consumers are. This will show the performance ‘pressure points’ in the storage infrastructure and help guide where to look for a performance problem.



Switch Performance

George Crump, Senior Analyst

SolarWinds is a client of Storage Switzerland

The first place to look is the storage switch. The speed of the storage system is irrelevant if the switch is performing at its maximum potential but still bottlenecking access to storage. The switch is also often an area where a little rebalancing of traffic can go a long way toward increasing performance. In fact, many extra switches and ports are purchased because storage administrators don’t have visibility into the traffic going through their existing switch ports, making the actual utilization of total switch bandwidth remarkably low.


A storage management tool will be able to show the overall load on the switch as well as provide a drill down to see busy ports or inter-switch links (ISLs).

The first thing to look for, especially in a high switch-count infrastructure, is to examine the ISL traffic, the traffic flowing between each switch. Make sure that the speed of the ISL is set correctly. It’s not uncommon to find switches where the ISL speed is slower than the speed of the switch itself, something that can happen as switches are gradually upgraded. For example, if a fabric consisted of just 2Gb FC switches and then one 4Gb FC switch was added, many administrators would hard set the ISL to 2Gb on each side. When the other 2Gb FC switch is finally upgraded the setting is often not changed to its full potential. Next look for ISLs that are being overused and see if an additional ISL or even an additional switch might better help traffic flow.


The next step is to drill down to the specific ports on the switch to see which are experiencing the most traffic. The value that a storage management tool brings is the ability to see this traffic across switches so that a comparison can be made. In many cases, two switches may be carrying the bulk of the traffic while two others are barely being used. Simply moving a server or two to another switch may result in better performance.


Once the fabric has been better balanced, then the amount of traffic going through certain ports can be examined to see how much of the available bandwidth is being consumed. If it’s consistently hitting the threshold speed of the port/switch then an investment in a faster switch may be warranted. A storage management tool can ensure this step of purchasing more hardware is a last resort, after a complete optimization of the existing investment has occurred.

Storage System Performance


Once it is confirmed that the storage fabric is no longer a bottleneck, the next place to examine is the storage system itself. In these storage environments, multiple physical servers typically share the same physical disks. Understanding how many servers and how performance demanding those servers are is critical to maximizing performance.


Similar to the previous scenario with a busy switch, if several active servers are sharing the same physical drives, then performance is going to suffer. Creating separation and moving some physical systems to different drives can greatly improve performance. As was the case with switch performance, this can be done without buying additional storage hardware. Simply rebalancing workloads so that the performance demanding servers are spread out across the available disks will reduce head seek.

Finally, once the storage system has been better balanced, it’s time to examine the performance of the RAID groups and LUNs to see if the amount of IOPS being generated is near the limit that the system can produce. Again, the storage management tool will report this type of data, sort for easy discovery as well as report on queue depths and response time/latency. With this data in hand, the decision can be made to investigate adding more or faster drives to bring queue depth down or to add solid state disk technology to reduce queue depth and latency at the same time.


This capability is critical, especially with virtualized storage systems like those offered by HP 3PAR, Dell Compellent and others, since these systems, as a default, spread data across all available drives. The assumption that adding more drives will increase performance may be inaccurate, since many drives are already utilized in the system. If the application is not experiencing long wait times due to its own queue depth, it’s doubtful that adding drives will increase performance. Operating system tools will have difficulty providing this analysis.



Virtualized Infrastructure Performance


One of the biggest consumers of storage I/O is the virtual infrastructure. In a virtualized server environment, many relatively low-demanding hosts are all virtualized to create a few very demanding and very random I/O consumers. This is where products like SolarWinds Storage Manager bring significant value.


The first challenge in performance tuning a virtual infrastructure is being able to see “inside” of the host to understand which virtual machines are the large storage I/O consumers. Unlike the non-virtualized examination, each physical host can generate dozens of different I/O profiles. To further complicate the performance diagnosis, each disk area assigned to the physical host is most often shared with other physical hosts to enable functions like virtual machine migration. A poor performing virtual machine may be suffering because another VM is stealing those resources and starving the problem VM, not because of the quality of the resources available.


Products like SolarWinds Storage Manager can report on which virtual machines within each host are the most demanding. It may make sense to create multiple data stores on different physical disks so that busy VMs are not creating as much hard disk thrashing per physical spindle. To accomplish this requires that the software be able to perform an end-to-end map of the virtual infrastructure and lay that over the map of the storage infrastructure.

Finally, it is important to know how storage performance is impacted as virtual machines are migrated to other hosts. This requires a historical tracking of virtual machine movement between hosts, something that SolarWinds calls “Time Travel”. Using SolarWinds Virtualization Manager, this feature allows the administrator to develop best practices on where and when to move a virtual machine.



Trending


Similar to how capacity utilization needs to be tracked over time, so does performance. A performance problem will not always happen when the administrator is staring at the console, nor do all spikes occur at the same time. Performance utilization needs to be captured so that it can be replayed to pinpoint when a performance spike occurred. This allows the balancing of the environment so that performance spikes can be distributed both across the infrastructure and throughout the day.



Smart Upgrades


The above performance examination is designed to expose opportunities for improvement in the current infrastructure without buying additional hardware. At some point capacity must be purchased however, as the environment can only be optimized so much. A storage management tool adds value during the upgrade process as well. Armed with an historical profile, the IT manager can recommend the capacity and performance upgrade that will deliver the greatest increase for the least amount of money. More importantly, the storage management tool delivers the confidence of knowing that the upgrade will actually improve performance.



Summary


Storage performance tuning is often looked at as a ‘black art’, something to be performed by storage ‘wizards’ armed with decades of experience. A storage management application can level the playing field and give the busy IT generalist the information they need to make significant improvements in their storage design. However, even the storage specialist will benefit from these tools, which can help them make quicker, more effective and less reactive decisions.


In today’s virtualized environment where servers, networks and storage are all shared, manual tools simply can’t keep pace and deliver the required performance diagnosis expertise. Tools are needed to extract maximum performance from the storage investment and to keep up with the dynamic data center.