Author: castroflaviojr

Is vendor lock-in really a big deal?

Published by castroflaviojr on July 29, 2017

I’ve recently come across a Datanauts podcast regarding ““Choosing Your Next Infrastructure” ( if you like podcasts, I HIGHLY recommend Packet Pushers, I’m a fan because of their diverse and unbiased content). In this episode, various great considerations on choosing new infrastructure are made and they perform an excellent job at describing pros and cons of different strategies, but a few points regarding vendor lock-in got me scratching my head. The article “Vendor lock-in the good, the bad and the ugly” does a great job at explaining the overall concept of vendor lock-in.

Additionally, I see it in the following way: Some vendors provide hardware and software as integrated solutions, potentially including storage, networking, or computing. Traditional vendors have been doing this for decades and that’s one part of vendor lock-in, because you rely on your vendor to deliver new features, if they do not deliver it, the migration costs, most of the time, would be prohibitive, and a good enough reason to just pay the same vendor a premium.

imgonline-com-ua-twotoone-BHYS6FHG8Q

During the podcast, the following question was asked: “If you commit to a hyper-converged platform, you are committing to a vendor and thus, in fact, locked-in, is that a big deal?”.

Where the response was “What’s important is understanding that lock-in is going to happen… and it’s important to choose a vendor that is going to be a good partner for your business… So if you have a very good relationship with a vendor who provides an all-at-once solution, that may be strategic for you, and if you would rather keep the hardware open and have a vendor you trust to give a good software solution, that’s your best path”.

Learning curves, and migration costs will always exist. Successful organizations, managers, and architects will minimize those costs while meeting critical requirements. That answer caught my attention because this is not the first time I’ve heard comparisons between hardware lock-in and software lock-in minimizing the cost of hardware lock-in. I’ve heard stronger opinions from hardware vendors before (of course): “hardware locks you in, software locks you in, therefore you might as well lock yourself to the hardware”; that statement is easy to be made when you are selling hardware, it’s much harder to justify when you are buying hardware.

I’m not completely opposed to lock-in in order to meet critical requirements, but that decision must be taken very carefully, and rationally, more often than not, the future cost of the decision is much higher than the initial cost of the whole project. Requirements are uncertain, and they become more dynamic every day.

For example, say at the time of design you thought your critical requirement was performance and acquired the best in the industry, a year from now, your solution becomes popular in your organization (because it is so good!), now multi-tenancy is much more important, and you are locked in, your manager now demands multi-tenancy and your sales engineer gladly offers you an add-on contract for whatever price (s)he wishes. The requirement is fulfilled, all parties involved go to dinner at a fancy steakhouse, everybody is happy!

If your organization is mature enough to have a project starting and ending with the exact same requirements, then you definitely should pick vendor-lock in. But, if your organization stands in a dynamic environment, external or internal, then you should always maximize choice and minimize barriers to change in order to meet ever-changing requirements.

imgonline-com-ua-twotoone-G7zqrsKzwtXPUX

I’m a firm believer that competition and choice ultimately drive innovation, thus in order to consistently deliver innovative solutions one must be open to competition. I’d argue that computers only are what they are now because of choice. And personal computers can be a nice example. One can choose between AMD or Intel processors, OSX, Windows or Linux. At the end of the day, lots of people will buy a solid computer integrated by Microsoft or Apple, but in the long run, the most innovative solutions and sometimes cost-effective solutions are the build-your-own type.

More than that, at the end of the day, a well put gaming setup is much more exciting than a boring Macbook, as Facebooks’ or Google’s chassis switches are more exciting than an expensive Juniper router.

Leave a Comment

Network Disaggregation – The holy grail?

Published by castroflaviojr on July 18, 2017

Tl;DR: Yes

The networking industry has seen more innovation in the last decade than in the last 30 years. The popularization of the SDN concept and the release of OpenFlow 1.0 pretty much ignited a flame present in every operator’s mind: the fear of vendor lock-in.

It was common for operators to solely rely on a single vendor every time a new feature was need: let’s say, Joe has decided your network now needs to be monitored using a specific monitoring protocol, xFlow, for illustration, then, because you only use vendor A gear you would have to ~~convince~~ request your vendor to add that feature to your software stack. Your sales engineer would then have to convince his developers that this is a critical feature and then that feature would have to go through the full Q&A hardening pipeline in order to make sure it doesn’t break any of the 400 protocols present in the OS of your network. That process easily took years. It still takes a few years for the unfortunate souls that choose to be locked into a specific vendor.

OpenFlow became popular as a promise to bring innovation to the industry and solve the multi-vendor integration problem by providing a standard interface for programming the network. As I mentioned in my last post, while it has brought innovation to the industry, for a lack of a strong standardization process, it failed to achieve vendor integration, and the demand for an escape route from vendor lock-in remained.

In 2011, a few smart minds in the industry ( Facebook, Arista, Rackspace ) started the Open Compute Project as an initiative to open hardware design, having in mind that there’s already so much innovation in the software layer of computation. Quickly the idea expanded to networking gear and a trend of disaggregation between NOS (networking operating system) and hardware started. Hardware vendors such as Broadcom and Mellanox started working on their own abstraction for hardware programming interface, and that abstraction layer allowed a lot of good innovation and that’s where the OpenNetworking concept started.

Having established a common interface to interact with the hardware, several NOS vendors have come up and in fact disaggregated the network. This naturally allows for faster development cycles since it decouples software development cycles from hardware development cycles, the NOS vendors focus on software instead of hardware specificities, it allows for a diversity of vendors, increasing the speed of innovation.

Let me give you a couple examples: Say, you convinced your manager to buy Open Networking gear based on Broadcom chips (for example) and you went for a “traditional” vendor, say, Dell, 3 years later, Broadcom comes up with a next generation chip, you could (1) choose to keep using Dell and upgrade the gear with no need to change any management systems. Alternatively, (2) let’s say Dell features didn’t keep up with your expectations, then you could replace it with Arista, or even Cumulus Linux in order to experiment with completely new paradigms and finally deploy xFlow. On another scenario, let’s say Mellanox next generation hardware performs much better, then you could again choose to keep using Dell OS and smoothly upgrade your hardware for an optimal cost.

Traditionally, vendor lock-in makes you pay for decades for a non-optimal decision, network disaggregation makes your decisions lighter, allowing you to quickly rethink your strategy and cheaply pivot if necessary.

Choice is extremely powerful, in college, I remember being amazed by the power of MIMO communications. Embracing path diversity and the ability to “choose” the best path just almost linearly increases the capacity of a channel. Network disaggregation gives you the same power, the power of choice.

Now, let me approach a few misconceptions I’ve seen around:

Is network disaggregation SDN? No.
Can SDN be achieved through network disaggregation? Yes, ultimately network disaggregation accelerates innovation.
Does OpenFlow effectively locks you to a vendor?

That’s a good one and I’m going to answer this on a next post.

Don’t hesitate to reach out to me with any questions.

Leave a Comment

Has OpenFlow failed? – Challenges and implementations

Published by castroflaviojr on July 11, 2017

In truth, very few vendors have successfully implemented full capabilities of OpenFlow. OpenFlow provides way too much flexibility to programmers. It’s hard to make the hardware couple with that much power. A few vendors are able to deliver programmable ASICs like that such as NoviFlow, Corsa and Barefoot.

The reason for that comes from the nature of matching tables, a match table is implemented in memory. In a match table, we match on a field, say MAC address and we take an action, say forward the packet to port 1. The complexity comes when we want to match on multiple fields. Say we have a MAC table with N addresses, and an IP table with M addresses. The total size of my flow tables (memory) is M +N. Now if we want to execute the match on a single table, the size of those tables raises to M*N. Now imagine matching on multiple fields at the same time.

The multi-table aspect of OpenFlow, came on version 1.3, and it addresses the scalability problem of flow-tables. But now the challenge is how to provide a standard API via OpenFlow when different vendors have different table patterns?

The answer is we don’t. Rather, we adapt our OpenFlow version to each vendor in order to achieve our forwarding objective. Now, say we want to do a L3 forwarding – which means match on ip, then modify L2 addresses and forward to port N – one vendor might have put the modify action in the IP table, while other vendor might have grouped all actions in a group action later on.

OpenFlow became popular as a promise to bring innovation to the industry analogously as the x86 API brought innovation to computers. In truth, interoperability between vendors via OpenFlow has been rare, exactly because vendors have different implementations of OpenFlow. We’ve seen vertical stacks of software deliver SDN capabilities, but we haven’t seen interoperable solutions yet.

Last time I checked, ONOS, a great SDN controller, provided an abstraction to Openflow via the FlowObjective primitive, basically, an Objective is defined and then the OpenFlow drivers will match that objective to the hardware implementation. What that provides you is the ability to have a controller controlling multiple vendors. Vendors still need to write code as drivers but developers only have to write software once. Again the power of abstraction shows itself. There may be others out there, but I’m aware of a couple solutions for OpenFlow fabric such as BigSwitch and Trellis used in the CORD project that have successfully deployed stable solutions.

OpenFlow is not the answer to all your networking problems. The perfect abstraction for networking is the answer, but it does not exist. OpenFlow definitely succeed in bringing innovation to the networking industry. A few vendors like BigSwitch have built incredible solutions. and the OpenNetworkingFoundation has merged with the ON.LAB which may bring some more energy towards standardization of the protocol. The support from vendors has slowed down as vendors started generalizing the SDN definition, I will write more about it.

Leave a Comment

Visualizing Sflow data with Ntop and Nprobe on Ubuntu 16.04

Published by castroflaviojr on April 28, 2017

Open Source tools can be useful if you need to put something together easily.

I was able to use Nprobe to visualize real time traffic observed via Sflow. Here is how you install it on Ubuntu 16.04.

wget http://apt.ntop.org/16.04/all/apt-ntop.deb
dpkg -i apt-ntop.deb

apt-get clean all
apt-get update
apt-get install pfring nprobe ntopng ntopng-data n2disk cento

Nprobe works as a Sflow collector and consumes the data generated by the switches. Nprobe, then, exports the data to Ntop.

To start Nprobe run:

sudo nprobe –collector-port 6343 –zmq “tcp://127.0.0.1:5556” -i none -n none

To start Ntop make sure you properly configured:

–interface=tcp://127.0.0.1:5556
–http-port=4000

Then restart the service:

sudo service ntopng restart

Then access http://127.0.0.1:4000, login with admin, admin and you can see something like this:

Screen Shot 2017-04-28 at 2.53.27 PM

1 Comment

Network Automation vs Software Defined Network – Ansible vs Openflow

Published by castroflaviojr on March 16, 2017

At Verizon, we are moving towards automating network configuration and provisioning. To me the goals for this move can be summarized as:

Maintenance cost reduction
More agile deployment processes

Coming from an OpenFlow SDN background, where changes to the network can be immediate, and looking at the real world, where changes to the network require human approval and human intervention to be deployed resulting in 1-2 weeks time, it’s really hard to tolerate this acceptance for delay with legacy systems.

I’m much interested in identifying where automation of legacy systems offers a real benefit over OpenFlow networks and vice-versa. My experiences tell me the biggest paradigm shift comes from the users. If the network operator is used to the OpenFlow paradigm, and has the software development skills, pretty much anything can be done. On the other side when the network operator comes from a classical Cisco network engineer background, even incremental changes to the network as advocated by network automation gurus can be challenging.

So far, my only experience with network automation is Ansible. A great positive factor for Ansible is its learning curve. Very easy to try. Right now, I’m intrigued with testing of Ansible code, refactoring variables consists of project-wise find and replace, it’s also not yet intuitive to me how Ansible code can be continuously tested and deployed. Quoting Uncle Ben: “with great power comes great responsibility”, Ansible does give you the opportunity to mess up things really well.

That’s where my bias towards OpenFlow comes in: successful OF projects, like ONOS, have been tested for a couple years now and are quite mature for open source projects. AS mentioned in my last article, to me it all comes down to the skill set companies want to cherish, it’s easy to leverage network engineer expertise plus some python scripting capabilities to work on network automation, but I bet you won’t get great code quality out of that.

Another option is to leverage great software developing skills to make sure you do get the code quality, but then what I would advocate for is to get this great software developer and put him to work to develop a real SDN system with real software challenges in place where the opportunity for gain is incredible.

OpenFlow has an inherent disadvantage which is the requirement for extra hardware support. Successful OF deployment have been performed with new gear, or have used successful hybrid deployment strategies, which can be complex. So, if you want to improve current deployments, OpenFlow won’t be your pick.

I’m still skeptical regarding the value of network automation, other than incremental adoption of new technology, in other words, it’s easy to sell.

1 Comment

What’s going on?

Published by castroflaviojr on March 14, 2017

In January, I’ve started working for Verizon as a DevOps Engineer with focus in network engineering. I’ve been working with SDN for about 2 years and my last experience was at the Open Networking Lab, a research lab, pioneer in terms of SDN research, in collaboration with AT&T.

In this article, instead of describing a technology as I usually do in this blog, I’ll try to summarize my thoughts on where this industry is going.

Every day it’s clearer to me that innovation in service providers is driven by two factors: pressure to reduce acquisition and operational costs; increasing pressure to deliver new services fast, which BTW happens in order to generate new sources of revenue.

Most service providers are trying to leverage open hardware from OCP and open source technology in order to achieve those goals. The “open” alternative of solutions is quite cost-effective compared to current legacy solutions; at the same time it offers the opportunity to be at the edge of technology development, that’s to say open technologies fasten innovation cycles significantly. The disaggregation of network devices has played a tremendous role in enabling innovation as well.

There are challenges in order to achieve those goals. Acquisition costs are definitely the most compelling point of open technologies. The delivery of the open source solutions on the other side is where the risk lies. If you are used to open source, you do know that bugs are just part of your life. There’s a 9 in 10 chance that at least one of your critical features won’t be supported natively by available open source solutions.

To couple with that I believe service providers should invest in acquiring diverse talents, or invest in training its own staff.

The truth is change is inevitable, you either hop on the boat and deliver reduced costs or new services or you will be left behind. We’ve started to see evidence why that has been happening with big vendors, I believe this pattern will repeat with providers.

In the next posts, I’ll try to comment on what is going on with vendors or make a follow-up post with my thoughts on costs, risks and benefits of this search for innovation as well.

Leave a Comment

Troubleshooting Shortest Path and Topology Discovery on RYU

Published by castroflaviojr on November 12, 2016

This post is a follow-up to Shortest Path forwarding with Openflow on RYU.

I originally made this code to show how to use SDN to achieve one of the most basic things you can do in a network: shortest path forwarding. In this post I’m answering common question on getting the code to work.

Quickstart:

Assuming you have all the dependencies, you should be able to run a mininet topology using:

sudo mn --topo=tree,4 --controller remote

After starting mininet start RYU using the following command:

bin/ryu-manager --observe-links ryu/app/sp.py

In my computer this is sufficient to discover the topology.

Now, let’s move on to the questions:

1 – Why do I see an empty or incomplete list of links?

Honestly, I’m not super familiar with the RYU topology app, so I don’t know. What works trying to restart Ryu/Mininet in different orders, so stop both applications and try starting Ryu first, if that doesn’t work do the opposite. Repeat until it works.

2 – Does it still work with a loop in the topology?

As far as my tests go it does work with a loop in the topology.

3 – Does it still work with a Spanning Tree?

To test it I start mininet, setup spanning tree using ovs-vsctl, then I start RYU. After RYU learns the topology it successfully lets the pings go through.

I had to restart RYU a couple times until it learned the topology

4- Why do I see so many packet-ins?

I did not care to handle floodstorms when I coded this, so if your topology has a loop and spanning-tree isn’t set, ARP and other types of flooded packets may be broadcast forever in your network

5 – Can I use another algorithm or set custom weights?

Yes. To set custom weights you just have to figure out how to add that information to the network graph. I’ll try to give an example for this soon.

1 Comment

VMware ESXI Home Lab

Published by castroflaviojr on August 24, 2016

I recently bought an Intel NUC 6th generation in order to build my own VMware ESXi lab. This is my first home lab and the first PC I ever built so I’m excited.

I’m building this for two reasons, one my laptop has a small SSD preventing me from having a bunch of VMs. Second, I’m attempting to get a CCNP certification and I’d like do setup a virtual lab for that.

Bill of materials

I decided to go for the i5 simply because the i7 design wouldn’t allow me to have a HD, while the one I got has space and a connection for a SATA disk.

Intel NUC 6th i5 – 390 U$
Seagate 2TB HDD SATA III (ST2000LM003) – 96 U$
G.SKILL 32GB DDR42133MHz – 130 U$
SamSung 256 GB M.2 SSD

I also had to buy a keyboard to complete ESXI installation. I bought the NUC with the 256 SSD included on Ebay for 390. The total price was 616 U$ which makes me pretty glad for an I5 machine with plenty of storage and fast SSD if needed.

Assembly

Assembly was straightforward and I used this video as reference.

Installation

Installation is simple and consists of 4 steps:

Downloading ESXi iso
Creating bootable ESXi usb drive from image using RUFUS
Installing and configuring ESXi
Installing GNS3 from OVA

I will come back here and put a link to download the ESXi iso, basically vmware can provide you this.

Rufus is also very straightforward and can be downloaded here.

Configuring ESXi could be tricky, but don’t pay attention to details, simply enable ssh and set a static IP address and you should be fine. Next you can download a Vsphere client from the ESXi machine. And you can also use the web browser.

I couldn’t create a VM from the OVA using the web client, so I recommend you to use the vsphere client.

If you need a step-by-step guide. I recommend you to check this youtube guide on how to install ESXI 6.0

In my next blog, I’ll post my experiences with GNS3.

ps: I wish I had installed ESXi with an SD card, just because I think it’s cool. I also wish you could deploy a VM from an OVA in the ESXi storage because that would make it much faster.

I’m also a little pissed because VIRL requires an 200$ license. I haven’t tested it yet but I have the feeling that for learning purposes INE would be much more cost effective, and I doubt VIRL will provide a seamless experience.

Thanks for reading. 🙂

Leave a Comment

Back to blogging

Published by castroflaviojr on August 24, 2016

For contextualization, I just concluded my internship at ONLab one of the pioneer research labs in SDN. Now, I’m looking to get a little closer to the industry and I’m pursuing a CCNP certification.

My study plan is simple. I will build a home LAB using GNS3 and VIRL to practice the contents of both exams and go through the certification guide trying the configurations on the virtual lab. I aim to quickly acquire the CCNP certification in a month since I believe I already have the necessary skills. That gives me one exam every 10 days…

My first step was to build a ESXI lab on a Intel NUC computer. I’ll post the details in a separate blog post.

Leave a Comment

On the path to deployment of SDN technologies

Published by castroflaviojr on January 9, 2016

At On.lab we are moving fast toward real deployment of SDN technologies.

ONOS aims to be a reliable platform to program networks. In order to unleash the full potential of SDN, developers should be able to develop network programs regardless of the hardware used. This means the operating system should provide an abstraction that is just right, in a way that developers can take full advantage of the existing hardware while still being flexible enough to write software once and have it executed on anything.

That’s not an easy task, in order to achieve such a goal several subsystems and layers of abstractions are constantly being developed on ONOS. Today, I will approach the FlowObjective Service.

The FlowObjective service provides an interface between Openflow devices and ONOS. The need for it arose with OpenFlow 1.3 as vendors were allowed to diversify the implementation of multi-table forwarding pipelines in order to be more efficient. The diversification of pipelines is great for performance matters, but it is not so great for developers who have to either choose one specific vendor to write software for or rewrite the software for each hardware device.

The FlowObjective service abstracts that complexity by means of OpenFlow drivers. Using the Flow Objective forwarding elements, you only have to write code for the application once, and someone only has to implement each driver once as well. Still, someone has to be the first to write the drivers.

The Bgp router app and the Segment Routing app currently use the Flow Objective service. In that manner, the OpenFlow drivers were built to support those applications and still may not be able to support some other applications.

We believe that the development of more applications will enrich the current OpenFlow driver, and the results achieved with those drivers will aggregate innate value to new applications. Wouldn’t it be great to write an app that just works in a well known set of hardware?

Well we are working for that!

Leave a Comment