Greg Ferro (@etherealmind) wrote an article a few weeks ago outlining his thoughts on how integrated physical and overlay networks would not be successful. I would like to expand a bit on some of the points Greg raises, and give (a piece of) our view on this topic.

First I want to touch on terminology. Spoken like an old school network engineer (zero offense intended Greg), he calls the two layers the "physical network" and the "overlay network". Later he introduces the term "underlay network" for the physical network, which indicates an interesting change of (industry?) perspective. The overlay has become the network, those boxes and wires have become the supporting cast, the underlay. It ultimately changes nothing, but what we call them provides a nice clue on point of view. We have had many layers in networking for years (and I am not talking about the traditional OSI layers), this is the first time the physical network itself has changed names. Maybe I am just being sentimental.

Greg points out that tunnels that make up an overlay have no inherent state in the network. The tunnels are stateless entities using standard IP/UDP to transmit their VM originated packet payload. Controllers manage the 100s or 1000s of tunnel endpoints, figuring out who is where, who has members of what tenant and (hopefully soon) will aid in the resolution of unknown destinations without the need for multicast. The controllers will have little knowledge of the physical network, they assume L3 connectivity between all tunnel endpoints. Traffic from all VMs behind a hypervisor will be merged together into a tunnel and rely on basic L3 packet forwarding to get them to the right tunnel endpoint.

A discussion with @davehusak reminded me of the free use of "stateless" in many discussions. Tunnels are not without state, both end points know exactly what traffic they should and should not exchange. They may simply be ignorant of each others ability to receive packets, the tunnel endpoints may not have any handshake that provides a consistent view of "up" or "down". But there is plenty of state associated with a tunnel.

Similarly, the network is not blissfully unaware of a tunnel. It knows the addresses of the endpoints, their MAC addresses, which port the endpoints are attached to, even whether the endpoint is physically connected and communicating. What the network does not know is that there may be traffic from many applications, running on many VMs, belonging to many tenants all hiding behind these two end points. Greg correctly points out that IP networks are designed to deliver packets and that applications are designed to retransmit packets when they need to, but is that good enough?

And this is where I have my disagreement with Greg. Adding bandwidth is not always the right answer. I think we will shortly find it may not always be the available answer, or the economically viable answer. Smarter use of the available bandwidth is a much better answer. Not all traffic is the same. At Plexxi we have made the case that there are  applications we care about more than others, and that different applications need very different service from the network. Hiding it in a tunnel does not change that. Rather, the additional mobility overlays provide will increase the complexity of managing the right network performance for the right application workload. Or tenant.

I started this blog with terminology, for a reason. As long as we treat, name and see the underlay, overlay, physical network, vSwitch, pSwitch, gateway and all other components of the network as separate entities that share no state or knowledge, but are engineered, orchestrated and managed separately from each other, we will never provide the applications with what they really need. Applications are king, all of the network components are just a medium for them to do their job. We need to dream bigger to get the network truly "out of the way" of applications. I don't believe that is an impossible dream.

Showing 5 comments
  • Allen Baylis

    Awesome write up !

    • mike.bushong

      Nice to see Marten landing some points in his first post. Well played, Marten.

    • Marten Terpstra

      Thanks Allen!

  • Brad Hedlund

    Hi Marten,
    A nice read and well written post. I just wanted to point out a couple things…

    1) The edge vswitch (tunnel endpoint) is capable of implementing a per-App/per-Tenant QoS policy, and reflecting that in the QoS marking applied to the tunnel IP packet. The fabric can see this and provide local QoS at each switch hop. In any good solution using overlays, the fabric does have visibility into differentiating application traffic.

    2) The tunnels in fact do have an “up” or “down” state, bidirectional, each endpoint aware of the others ability to receive traffic. This is done through heartbeat messages, similar to an IP routing protocol or BFD. The tunnel endpoints (hypervisors and gateways) are constantly testing the health of the fabric and report on that as a data point. Useful trouble shooting data like this didn’t exist before.

    I think Greg’s use of the word “stateless” was to say that the tunnels do not have any configuration state — a valid use of the word “stateless” and a very important point. The tunnels are ephemeral, and require no configuration state management by the network operators.

    Full disclosure: I’m a member of the technical staff at the VMware NSBU working on NSX.


    • Marten Terpstra

      Brad, thanks for reading. Some rebuttal:

      1) Understand that inner packet QoS markings can be pushed into the outer header (or specifically added by the vSwitch based on policy), but its still just that, an indicator of queuing/drop behavior. Traffic priority is only one type of service an application may want or need. What if I need/want to isolate traffic from groups of tenants. And I mean physical separation, for security, or perhaps even plain bandwidth reasons? There are services the network can (and should) provide that cannot be solved with QoS as we know it today.

      2) Fair enough for VXLAN (even though its hard to find any keepalive references for VXLAN) or NVGRE, STT has a harder time making that point considering the S in STT 🙂

      As for your last comment around state, you are actually making the point of my article. The tunnel configuration state absolutely requires management by the network operator. The vSwitch and VTEP are part of the network management domain. As long we believe the physical and overlay network are seen and managed seperately, we have not done our job.

Leave a Comment