George Crump, Senior Analyst

The Limitations of Spanning Tree Protocol

As we discuss in our article “What is Trill’s Role in FCoE Storage Networks?”, the traditional multi-tier network design was created to get around the limitations of spanning tree protocol (STP). STP is an aging network standard created when networks were connected via simple hubs instead of the switches we have today. Its intent was to make sure that there were no bridge loops in the network which puts the network into an endless loop and will cause it to hang. To do this spanning tree makes sure that there is only a single active path to each network device by shutting down any alternative paths. However, a loop-free environment wastes about 50% or more of the available network bandwidth. With the emergence of 10GbE the amount of bandwidth per deactivated link being wasted becomes even more significant.

The other challenge with STP is that the single path it creates between devices may not be the most efficient for communication between those devices. This means that a virtual machine (VM) migration, for example, may have to go through multiple network switches to move to a host even if there is a more direct path available. This extra time on the network might further increase latency and potential for VM disruption.

Any well designed network, of course, has redundant paths in case of failure. As stated earlier those paths are typically blocked or disabled. Most networks have many spanning trees. Often one per VLAN. As a result the network becomes very segmented which is not friendly to virtualization technologies like rapid application deployment and mobility.

When a failure does occur on an STP network, it has to reactivate or re-converge onto a new path that had been previously blocked. This process can take anywhere from a few seconds to several minutes depending on the network or spanning tree size. While that process is occurring, the network essentially stops. In the days of messaging-only networks this was an acceptable behavior, the applications would simply re-try until the network was ready again. In the modern network that carries storage, virtual infrastructure data and streaming data like video conferences this stoppage becomes a problem and can cause application failure or data loss.

The workaround for STP limitations has been to keep Layer 2 networks relatively small and join them together via Layer 3 segments. Prior to server virtualization and storage consolidation on Ethernet, this was a very acceptable practice and kept the STP problem in check. But using Layer 3 limits server virtualization’s capabilities. To be non-disruptive VM migrations need the source host and the target host, as well as their storage, to be on the same Layer 2 network. If a migration is made through a Layer 3 segment to another Layer 2 network then re-configuration needs to be done by network administrators. Also the IP address must change, so the migration cannot be live. The VM must be shut down. Essentially there is a hard requirement that live migration can only happen within a single subnet.

Port Consistency Issues

There are other limitations to creating a fully virtualized data center beyond the challenges presented by STP. One of the biggest is dealing with port consistency issues. When servers, virtual or physical, are set up one of the important characteristics is how each server connects to the network. Settings like VLAN membership, access control lists, quality of service and security profiles are most often controlled at the port or physical access layer of the network. In a virtual environment this can be handled by emulating these characteristics at the virtual switch. The soft switch included with hypervisors provides very limited port control, which is why much of the configuration work is still done in the physical network The problem for the cloud is that to move the VM, even within the same Layer 2 network, between two hosts requires that these settings be preset. If not, the pre-migration verification step may fail, or worse, the application may become unavailable after the migration occurs.

This requires coordination between server and network administrators to assure the destination host and its corresponding switch ports are configured correctly prior to migration, a process that requires planning, coordination and the least available resource, time. It is not uncommon that the server administrator would have to issue a ticket to the network operations team to reconfigure port settings before a migration can take place. The result is that the concept of live migration of VMs, especially in an automated fashion for optimal resource management, is more of a dream than a reality.

The workaround is to map all the settings to all network ports. This of course would go against almost every networking and security best practice. It would also make it difficult to fine tune VM performance to ensure that certain VMs are able to meet required service levels.

Intelligent Layer 2 Networking

The solution to these challenges is to build a scalable, intelligent, Layer 2 network that can overcome the limitations of STP, plus have port mapping capabilities to automatically move network configurations and port settings with the VM as it’s migrated. This would lead to a flat but scalable Layer 2 network designed for the cloud data center.

The blueprint for such a design already exists today in the form of the Storage Area Network (SAN). The fibre channel SAN is designed to be flat in its architecture, to allow multiple paths, and to provide portable port configuration information. TRILL, as mentioned earlier, brings much of this capability. It is the emerging replacement for STP in data center networks. It allows for network mapping information to be contained in every switch, eliminating the need for a single path and opening up all network connections. It effectively will, at least, double the available bandwidth from day one, as well as simplify network management, ease the re-configuration process, increase resiliency and provide faster failover.

Technologies like the Brocade Virtual Cluster Switch take TRILL a step further and provides a layer of network intelligence that allows for the advanced port mapping required to enable application mobility. In this environment port configuration information and security settings automatically follow the VM as it is migrated through the environment. As a result the extra planning and coordination between server and network administrators is eliminated.

In the future as standards mature these virtual switch technologies will also remove the responsibility from the hypervisor to create and manage a virtual switch inside each physical host. This frees up the hypervisor to provide other resources to the host and in general, improve VM efficiency. With the virtual switch removed from the hypervisor and the Layer 3 switches no longer needed to manage STP limitations, the management of the network becomes significantly easier. Infrastructure providers like Brocade can extend the virtualization concept to switches as well. For example, if two switches are placed at the top of every rack they can all be seen as a single large virtual switch to the network administrator, further simplifying management.

The Future is Now

The intelligent Layer 2 switch or virtual cluster switching is something more than something to be considered over the next few years. These technologies can bring significant advantages today, like simplified management, ease of VM migration and better port consistency. In our next article we will cover what three benefits intelligent Layer 2 networking can bring to the data center right now.

Brocade is a client of Storage Switzerland