Earlier this week at the Software Defined Data Center symposium Ivan (@ioshints) and I finally met in person, and something he said about his articles written in response to some of the Plexxi articles prompted me to write this. In the past weeks I have written about our beliefs that ECMP and SPF based networks create a large gap between what matters (the applications) and how conversations between them are transported across a network. And based on some of the comments left at our and Ivan's posts, and some really pointed questions asked during yesterday's Network Field Day 6, I will try and explain some of the fundamentals of how traffic flows on a Plexxi network and how Plexxi Control uses the path diversity created by the Plexxi switches. This first installment will explain the physical connectivity, and how a Plexxi network creates physical paths. Links in this article point to podcasts that explain some of these points in more detail.

Plexxi switches are 10GbE Ethernet switches. They are based on merchant silicon with all the good and bad that comes with that. The ports on a Plexxi Switch are divided into 2 groups: Access Ports and LightRail™ Ports. Access Ports are what you would expect, 10GbE ports to connect servers, storage, appliances, other routers, switches, you name it. Depending on the exact switch version there can be up to 72 of them. LightRail Ports are also 10GbE ports, but they are used to create the Plexxi fabric, to connect a Plexxi Switch to other Plexxi Switches. Again, depending on the exact switch version, there is a minimum of 24 of these ports, up to 48 in the newer switches. These LightRail ports do not appear as normal SFP+ or QSFP+ on the switch, they are carried over Wave Division Multiplexing (WDM) through our LightRail connectors. WDM allows us to carry multiple 10GbE connections on a single fiber by using different wave lengths for each connection.

Plexxi Switches are cabled to each other in a ring configuration. That means that each switch has one LightRail cable going to a switch to its logical East, and one to each logical West. Each LightRail cable carries 12 of the LightRail 10GbE ports. Even though the LightRail cables are connected as a ring, the actual 10GbE waves (or lamdas, or channels, we use them somewhat interchangeably) do not all terminate at neighboring switches. Because of the use of WDM, many of the waves are passed through the neighboring switch using passive optical technology to switches that are 2, 3, 4 or even 5 switches logically to the East or West. By using passive passthrough technology, these waves are passed even if an intermediate switch is turned off, rebooted, or otherwise take out of service.

 

Ring Waves

 

Each switch takes a set of the 10GbE WDM waves and passes them on passively, but some of the waves are terminated on this switch by attaching them to the ethernet switching ASIC. By doing do, the switches have created a point to point 10GbE ethernet connection between the switch that originated this 10GbE wave, and the switch that terminates it. When connecting Plexxi switches together, no configuration is needed to make this happen, automatically the switches will create a default topology, or more accurately topography (definition 2b in the Merriam-Webster dictionary). The default configuration creates 4x10GbE point to point connections between any 2 neighboring switches and 2x10GbE connections between any 2 switches that are 2, 3, 4 or 5 switches removed from each other.

The combination of all of these point to point connections (and note that this is not a ring a la token ring, FDDI, or anything else from the good ole days) creates a partial or full mesh of 10GbE connections between all switches connected together in a ring. Again without configuration required, the switches use a discovery mechanism to determine what switch is on the other side of each of these point to point connections. Even though this is their default topography, the switches make no assumptions on what switch is on the other side, they actively probe to find out. As a result of this discovery, each switch will have 24 or 48 (or in reality any number you want or need) links what connect to a set of other switches in the network. The combination of all these connections from all switches is the topography for this network. And that is a large amount of links. With 20 switches in a Plexxi network, there are at least (20*24)/2 = 240 10GbE links connecting switches together in a mesh. Using switches with more waves on the LightRail connectors, this can quickly get to 1000s of 10GbE links connecting switches. Not in a tree like formation, but densely meshed.

 

Ring2

 

The reason for creating this type of connectivity between switches is fundamental to Plexxi's belief that hierarchical networks cannot scale in the future, that far more complex meshed networks that provide enormous amounts of switch to switch (and ultimately application to application) connectivity and paths is the only way to scale data center networks. In a Plexxi network, the amount of diverse ways to get from say switch 1 to switch 5 in an 11 switch network is enormous. Switch 1 and 5 have 2 direct 10GbE point to point connections. Switch 1 has 4x10GbE connections to switch 2 and switch 11, each of which have 2x10GbE connections to switch 5, creating another 16 unique 10GbE paths between switch 1 and 5. Depending on the size of the network, there could be 100s of paths between 2 switches. Some direct, some with intermediate switch hops. And I used the word "default topography" on purpose. We have the ability to programmatically modify this topography using some of our optical components, but that is for some later date.

When you have this many paths between any two points in the network, you can be extremely selective in what traffic you want to follow what path. But you cannot do that with traditional ECMP and SPF methods that do not understand the differences between them, or randomly aggregate traffic onto paths. You need a very smart mechanism to calculate how to get traffic from one place to another in a way that satisfies the policy needs determined by you. Plexxi Control uses Affinities to calculate what traffic needs to go where. How it does that and how the switches autonomously use the information provided by Plexxi Control is for next week. It really is pretty cool. Trust me.

Showing 3 comments
  • Leonardo
    Reply

    So how is the wavelengths terminated by a switch (opposed to those passed on) determined? Is it by using tunable optical receivers?

    • Marten Terpstra
      Reply

      Leonardo,

      we do not use tunable lasers in our system, they are too expensive for the datacenter market. We take specific wavelengths on specific fibers through a set of filters and muxes and terminate them. We can also stitch 2 waves together without terminating them in the ethernet ASIC with some of our programmable optical components, ie in one wave, out another without doing ethernet switching.

pingbacks / trackbacks

Leave a Comment