Skip to content

Month: October 2017

Can P4 save Software-Defined Networking?

Now, P4 is gaining momentum due to engagement of big players such as Google and AT&T. P4 has potential to cause a significant change in the industry and deliver on the SDN value-proposition. I’d like to discuss that.

In summary, P4 aims to provide 3 main goals:

  • Reconfigurability
  • Protocol independence
  • Target independence

OpenFlow had its shortcomings: somehow diversity of implementation strategies evolved into incompatibility. P4 target independence proposes to solve this issue using a compiler to translate P4 code into switch code taking into account its capabilities.

Screen Shot 2017-10-20 at 1.35.00 PM

In order to understand how disruptive this is, let’s look at the current state of affairs: commodity silicon vendors such as Broadcom and Mellanox already have an API to control their switches, the existence of that API itself already disrupted the industry enabling Cumulus, SnapRoute and even Arista. Now would you prefer that your silicon vendors established a common interface, or would you rather rewrite software everytime you want to test a new switch Vendor? The answer is obvious: the first option benefits users and new vendors, the second benefits established vendors. New industry players or the adventurous operators could write software on top of P4 and achieve multi-vendor integration at the cost of writing compilers for each vendor they use.

So, that’s the big pay-off opportunity, enabling competition, thus innovation. The challenge here is to provide vendors the incentives to write the P4 compiler.

New industry players or the adventurous operators on the other side, could be able to write software on top of P4 and achieve multi-vendor integration at the cost of writing compilers for each vendor they use. That can be game-changing, the big questions are “How eager are developers to write P4 software?”,  “how much does it cost to hire somebody to do it?”, additionally, “Who will write Cisco/Broadcom specific p4 compiler code?

There are endless opportunities: in a parallel universe, AT&T forces Cisco to enable a P4 compiler to their devices, Cisco writes a bad compiler, claims it’s bad technology and sells you ACI instead. In a different universe, Barefoot writes a Broadcom compiler ensuring it works, but then it “wastes” some resources promoting a competitor. A little bit more realistically, SnapRoute or Cumulus could write a P4 compiler to Broadcom Tomahawk, and thus would be able to enable their software in a plethora of existing devices. Even more realistically, Barefoot writes their own compiler to Tofino and keeps selling P4 to a limited niche market.

Now, if Barefoot takes on the responsibility to write a P4 compiler for Broadcom and Mellanox that would be translated into huge value to NOS vendors and Operators; since they would be able to seamlessly switch vendors. It would marginally increase adoption of Tofino, so the question remains, who would pay for this?

Now how much does it cost to adopt P4?

Before I answer this question I’m going to callback to a point previously when I wrote about network disaggregation. I ended it asking: “Does OpenFlow effectively lock you in?”. Now the same question may apply to P4.

The question is misleading by itself. I’ve heard vendors saying “OpenFlow locks you in, you might as well just buy our SDN”. There’s just so much wrong with this. OpenFlow isn’t perfect, but it does allow you to adopt software processes to deliver features much faster than your vendor will.

Any choice is a potential barrier and locks you in a little bit, but what everybody refers to when talking about lock-in is hardware lock-in. When you buy a generic x86 computer you are free to install Ubuntu, Debian, Windows or whatever you’d like, when you buy a PlayStation, you can’t just install Xbox on it, that’s vendor lock-in, the costs of doing that are prohibitive, you would be better off just buying another appliance.

You could at barely no cost try an OpenFlow Lab or Field trial on Broadcom-based network devices and fallback to Cumulus if it doesn’t fulfill your needs. Unsurprisingly, The vendors will claim lab trials aren’t needed because of their product quality, but the experience will tell there will always be a missing feature.

Now P4, from the adventurous perspective, P4 is great, you just have to write more software to get it done. For everybody else it has a significant cost: you have to hire premium developers or Barefoot itself to do it. That cost won’t be insignificant when using Broadcom + Big Switch might already give you the tools to improve your current process.

OpenFlow vs P4

OpenFlow is going to be 10 years old next year, a significant amount of resources has been put into testing it. It has been (properly) commercially supported by Big Switch for 3+ years if I’m not mistaken. I’d say with certainty that you could get an OpenFlow solution production-ready in a year. Realistically, could you get P4 ready to be deployed in production in a year?


  • Will P4 replace OpenFlow? Maybe. P4 offers a different value proposition. OpenFlow agents may be written on top of P4. Great P4 implementations may force OF into being obsolete.
  • Will P4 replace Broadcom SDK? Same answer, P4 may write a much better API on top of theirs.
  • Will P4 replace OpenNSL?  Why not?
  • Will P4 replace NetFlow/Sflow? No. Sflow is a protocol to export data from the switches, it does not say (much) on how you should implement it in the dataplane.
  • Will P4 replace Riverbed? No way.
  • Will P4 replace OpenConfig? Nope, they are actually quite complementary.

Thanks for reading the long post. I welcome any thoughts or questions.


TCP BBR Congestion Control on Mininet

In this post, I demonstrate some benefits of using BBR congestion control and illustrate how easy it is to adopt it by using Mininet as an example. I’m excited to share this post with you guys because it’s been a while since I’ve made a tutorial and I love breakthrough innovations like this.

This post is divided into three sections: Background on BBR, Tutorial and Technical challenges.

Background on BBR

TCP BBR has significantly increased throughput and reduced latency on Google’s internal backbone networks. From this  a great resource:

TCP BBR is rate-based rather than window-based; that is, at any one time, TCP BBR sends at a given calculated rate, instead of sending new data in response to each received ACK. In particular, TCP BBR does not directly link the sending of new data to the receipt of ACKs, and so, strictly speaking, is not actually a sliding-windows implementation. Therefore, we cannot properly talk about winsize or cwnd. Instead, we talk about the number of packets in flight, which is the rate times RTTactual, with the understanding that this number may vary with conditions.

Basically, BBR estimates bandwidth by keeping track of goodput: if an increase in the sender rate does not increase the observed goodput, it assumes that’s the available bandwidth. It is reasonably effective in doing so and that way it provides minimal queueing in the network.

TCP’s throughput is inversely proportional to RTT and most TCP implementations cause additional delays, in consequence, TCP by itself can never reach 100% utilization. BBR changes that, that’s why it’s such an impressive accomplishment.

Quick start

Open Source is great because it allows innovation to be deployed much faster, BBR is already implemented in the Linux kernel and using Mininet you can test it right away.

I’m a long time fan of the website: reproducing network research from Stanford. I leveraged most of the Mininet code for this experiment from there.

Now let’s get to it!! This tutorial assumes you have vagrant and git. If you don’t, don’t panic, follow this link. To start you will need to set up the VM. I took care of all the dependencies for you. If you want to inspect what I’m doing take a look at the mininet role in the ansible folder.

git clone
git checkout vagrant
vagrant up

This should take 10 min to complete. After it’s done proceed

vagrant ssh
cd mininet
sudo ./ all

After around 30 seconds the experiment should be done and you can exit the VM:

open figure5_mininet/figure5_mininet.png

This should open the following figure:figure5_mininet

The figure compares the latency on TCP BBR and TCP CUBIC (less is better). And as you can see BBR reduces the latency from ~150ms to ~50ms(66%) on the average case and from 400ms to 50ms (87%) on the worst case. This is crazy!

Technical challenges

The first technical challenge is finding a linux kernel that implements BBR, and it turns out it’s implemented on 4.9 so look out for that. The second challenge was to implement the BBR pacing mechanism, it was mentioned on the CS244 website but I did not understand it at first.

BBR requires a mechanism to control the sender rate and it leverages tc ( traffic control ) module from linux. I knew about tc but I didn’t know it was such a powerful tool. After some research on linux queueing mechanisms, I found that BBR requires the fq (Fair queueing) queueing discipline because it uses that to rate control the sender. It turns out Mininet did not support fq for some reason, and I had to change a couple lines of code to add support for it.


TCP has been around for decades and for decades people have been trying to improve it. At first, TCP congestion control mechanism literally saved the internet, now I’m gonna be bold and say that BBR by providing a “queueless” congestion control is saving latency-sensitive applications. It really is a big deal. I highly encourage you to try it out, the least you should do is check the following article: Increase your linux server Internet speed with TCP BBR congestion control.

For future reference:

Leave a Comment