For as long as I remember networking has struggled with the balance between aggregated and individual traffic flows. Following the abilities of the technology components we use, we have been forced to aggregate, only to be allowed to de-aggregate or skip aggregation when technology caught up or surpassed the needs of today.
The vast majority of networking equipment is driven by specialized hardware. For datacenter switches, speed and port density are driving the requirements and physics and our technology capabilities create trade-offs that ultimately lead to some form of aggregation. Higher speed and more ports are traded off against memory, table space and functionality. These trade-offs will always exist, no matter what we are trying to build. Networking based in servers will have oodles of memory and table space to do very specific things for many many flows, making it extremely flexible, but those same servers cannot touch the packet processing speeds of the specialized packet processing hardware from Broadcom, Intel or Marvell, or the custom ASICs from Cisco, Juniper, or most anyone else.
So like it or not, we will want to do more than our hardware is capable of and as a result, we create aggregation points in the network where we lump a bunch of flows together into an aggregate flow and start making decisions on those. Nothing new, even good ole IP forwarding is doing so on an aggregate set of flows, it only makes decisions for all flows destined to a specific IP address.
Network tunnels are the most obvious examples of aggregation, their purpose is to hide information from intermediate networking equipment. In some cases we hide it to keep our table sizes under control, in some cases we hide it because we do not want the intermediate equipment to be able to see what we are transporting (IPSec, SSL, etc). And while sometimes the intermediate systems can see everything that is there, managing the complexity of that visibility simply becomes too expensive. This is why networks that are entirely managed and controlled per flow do not really exist at any reasonable scale, and probably never will.
For the exact same reason we aggregate, we lose the ability to act on specifics. When our tables are not large enough to track each and every flow, we can only make decisions based on what we have decided to keep in common. When talking about tunnels, the tunnel endpoints put new headers onto the original packets and intermediate systems can only act (with minor exceptions) on the information provided in these new headers. The original detail is still there and often visible to the intermediate system, but the intermediate system does not have the capacity to act on the sheer volume of that detail.
And there is the struggle. If I have more information, I can make better decisions. But when I aggregate because I cannot handle that extra information (due to sheer size or management complexity), my decisions by definition become more coarse and as a result, less accurate. But we want it all. We want the power to make decisions based on the most specific information we can, but want to aggregate for operational simplicity or because our hardware dictates. And this is where we get creative and start to turn what used to be black and white into gray.
There is nothing wrong with attempting to act on specifics for aggregate flows, but in so many cases its done as an afterthought and becomes hard to manage, control or specify. Some of the techniques we use are fairly clean, like taking the DSCP values from a packet and replicating it in the outer header of that same packet in a tunnel. Others are far more obscure like calculating some hash function on a packet header and using it as the UDP source port for the VXLAN encapsulated version of that packet. In even others, the original internals may be completely invisible to intermediate systems. STT for instance re-uses the format of TCP packets for its own purpose, but as a side effect of using it as a streaming-like protocol is that the original packet headers may not actually be in an IP packet on the wire. The STT header provides for a 64 bit Context-ID that can be used to take some bits of information from the original packet, but that STT header only appears in the first of what could be many individual packets that are re-assembled in the receiving NIC. Over the Christmas break I spent some time looking at each of the overlay formats and the tools modern day packet processors give you to act on these headers. I will share some of this in this forum next week.
Ultimately, overlay networks are creating a renewed emphasis on the choices between aggregation and individuality. Designed specifically to allow for more complex and scaled networks that hide a lot of the details from the dedicated network hardware, it comes with the price of less granular decisions by that hardware, which can certainly lead to less than optimal use of the available network.[Today’s fun fact: In the Netherlands, there is a 40% higher chance of homeowner insurance claims on the home owner’s birthday. Those are some good parties.]