Physical access monitoring is often not thought of, or even purposely avoided, when planning SAN deployments or upgrades. It’s either deemed too costly or too disruptive. The challenge is that traditionally, non-physical monitoring products can’t provide the detail that physical-layer tools do. Without this direct, tangible type of access, back ended by software analysis of real time infrastructure data, storage managers are left aiming at a target in the dark or covering the wall with darts, to make sure its hit.

There are costs associated with this lack of information. When a problem arises it never seems to happen when there is plenty of time to investigate, compare and discuss possible solutions. Usually, storage problems arise when systems are being stressed to their maximum and critical work needs to be performed. In an environment laden with variables, this haste can force an attempt to ‘fix’ many of these variables at once, instead of the ideal step-by-step approach that better facilitates root-cause discovery. Occasionally, this sweeping set of the changes does seem to fix the problem. But since there’s rarely time for real analysis there is no way to determine for sure, nor is there any way to know which expenditure(s) or changes actually fixed the problem. Most importantly, there is no understanding of when the problem may return or for what reason, so essentially, the cycle will start all over again.

In this environment, when the initial ‘fix’ doesn’t work, the storage manager or their storage/SAN supplier is forced to make repeated waves of “fixes” until the problem is resolved. Each wave requires potentially greater dollar investments in equipment and time as they work through this trial-and-error process. Eventually, if nothing works, systems need to be hastily shut down to provide the physical access required so that true problem diagnosis can begin.

Physical network access is not a new concept. IP networks have leveraged physical access monitoring and analysis for decades. The storage networks in the past had typically been considered too small to make this investment feasible. And storage managers, when confronted with the idea of physical access, were often resistant to it as it could mean downtime for implementation. However, fibre channel SANs now have reached, and in most cases surpassed, the densities needed to justify these kinds of physical-layer devices, called TAPs.

A Traffic Access Point or TAP is essentially an optical ‘signal splitter’ that divides out a sample of incoming traffic so it can be monitored and analyzed by software like Virtual Instruments NetWisdom or VirtualWisdom. TAPs are zero-latency devices that use only a fraction of the optical light that storage network components need, causing no performance loss, but providing valuable information that non-physical products can’t, like CRC errors, queue depths, and latency analyses. Also, these devices require no power or cooling, so they put no additional burden on the infrastructure.

The challenge with TAPs in a fibre channel infrastructure has been how to get them integrated into the SAN. The TAPs traditionally need to be inserted near the switch core for most effective analysis. The implementation of a TAP requires some planned downtime as it is integrated into the cable infrastructure. Unfortunately the realization of their value is often only understood when there is a critical problem and emergency maintenance windows need to be opened up. However, many larger data centers with fully redundant SAN architectures should not have to experience any downtime for implementation. They can leverage the redundancy designed into them, letting the automated fail-overs built into the SAN occur. Even if there is slight downtime in TAP integration, in nearly every case, it’s more than made up for by significantly increased problem resolution and better utilization of current infrastructure assets. The larger issue with waiting until a problem occurs and ‘TAPing as needed’, is that it only helps resolve the current issue; when a new problem arises in the infrastructure the process has to start all over again. This is similar to hauling in water to fight a fire, instead of installing fire hydrants or sprinklers before a fire breaks out.

Ideally, TAPs should be integrated prior to the need for detailed troubleshooting, and products like Virtual Instruments SANInsight fibre tap patch panel system allow that integration to happen cost effectively at the patch-panel level. Almost every SAN infrastructure now leverages a patch panel to manage the cable plant. By unifying TAP and patch functions into a single layer of physical infrastructure, products like SANInsight can significantly reduce the cost, complexity, and infrastructure impact of TAP deployment. The key to success is that storage managers and infrastructure design specialists need to be forward-thinking in their implementation strategies, and insist that TAP deployment become a required best practice of the modern data center.

With the exception of a completely new data center build, the first step in deploying this best practice is to make sure that new racks are connected via a tapped patch panel infrastructure as they are implemented. Then, as a second step, similar connections should be made when maintenance windows allow retrofitting old cable plants with TAPs. When this procedure is followed an almost immediate benefit should be seen. Downtime risks, and even performance bottlenecks, can be identified early, preventing problems before they even occur. Another ‘side effect’ of this real time information is the improvement in SAN utilization, to the point that infrastructure costs are actually driven down while simultaneously driving up application performance. This resulting win-win from the detailed analysis that TAPs provide can lead to accelerated implementation of TAPs elsewhere within the storage infrastructure, and more wins.

TAPs, especially with the ease at which they can be integrated into the corporate patch panel complex, should now be considered a best practice when implementing or upgrading a storage infrastructure. While the primary use may often be thought of as a vehicle to solve problems, they should also be considered a time saving investment. Storage infrastructure-related issues can now be resolved with much less time and effort, or even better, prevented before they occur, thanks to real time data delivered by these physically connected verification and monitoring tools.

George Crump, Senior Analyst

Virtual Instruments is a client of Storage Switzerland