OpenStack

Actionable CI

I’ve observed a persistent theme across valuable and successful CI systems, and that is actionable results.

A CI system for a project as complicated as OpenStack requires a staggering amount of energy to maintain and improve. Often times the responsible parties are focused on keeping it green and are buried under a mountain of continuous failures, legit or otherwise. So much so that they don’t have time to focus on the following questions:

  1. How do you determine that a job failed?
  2. How are the results presented to the relevant developers?
  3. Can developers do anything about a failure?

To bring it to concrete terms let’s take a look at how Rally is used in upstream jobs. This is not a criticism of the Rally project itself, which I’m a big fan of, but rather how it’s used upstream. It uses the standard upstream CI infrastructure, which is a miracle of engineering when it comes to correctness tests. The infrastructure spins up VMs from a node pool comprised of many clouds. It then uses devstack-gate and devstack to install OpenStack and runs several Rally scenarios. When the result of a CI run is True or False, the variance of hardware and congestion levels is irrelevant. However, when you’re trying to measure performance, variance matters. You can try setting a maximum, and any result over the maximum is declared as a failure, however with a variance sufficiently large setting up SLAs is an exercise in futility.

Let’s look at a recent Linux Bridge change [1] that cannot impact Rally results (The Rally job is setup to run against Open vSwitch). Consecutive runs would ideally show the same results. However, looking at the results of patchsets 10, 11, 13 and 14, we can see that the total length of the job runs between 60 and 83 minutes. The full duration of the create_and_list_ports flow runs from 1517 seconds to 1887 seconds. The average for a single create_and_list_ports execution runs between 4.58s and 5.29s. What am I supposed to do with the results of the next run? What can I learn from it? I’d argue: Nothing. The results of the job are not actionable. The result is that the job has been non-voting ever since its introduction and worse yet, none of the engineers I work with look at its results.

The next step would be to give up on the idea of gating or blocking performance regressions and instead detect them after the fact. We can do that by persisting historical results, graphing them and spotting trends. It’s clear that with a variance this large, the results would not be actionable either. To demonstrate this, let’s turn to the fantastic openstack-health project. Looking at the Neutron API test with the longest average run time [2] we can see that at the time of writing, the test ran 249 times in the past month so we get a great sample size. However, the run time graph looks like a Jackson Pollock painting, with a min of just under 5s and a max of just over 9s. Looking at the graph it’s clear we can’t clean up the data via statistical Jiu Jitsu either. When consistency matters, I don’t think you can get around a dedicated bare metal setup.

api_run_time

The Gerrit interface does a great job of presenting CI results, and a failing voting job forces developers to look at its results. However, I don’t know many engineers who look at CI results as a form of amusement. Post-merge and periodic CI runs in to these issues – They burn your favorite form of fossil fuels and drain the life force of the fine folks who maintain it but the results are often not presented in a consumable manner. Running the tests reliably is as important as making sure the intended audience is aware of the results. One solution could be to make sure the relevant developers subscribe to a mailing list, triggering a mail on failures filtered after distracting infrastructure issues. Periodic CI can only be valuable if it’s actionable and developers are held accountable and demonstrate a persistent urgency to failures.

[1] https://review.openstack.org/#/c/346377/
[2] http://status.openstack.org/openstack-health/#/test/neutron.tests.tempest.api.test_auto_allocated_topology.TestAutoAllocatedTopology.test_get_allocated_net_topology_as_tenant?resolutionKey=day&duration=P1M

 

Standard
Uncategorized

I’m looking for someone to join my team and work on OpenStack networking and service function chaining

EDIT: The position has been filled, thank you everyone.

I lead a globally distributed engineering team at Red Hat, working on OpenStack’s networking projects. I’m looking for someone to be a part of the team with a focus on the SFC project. The candidate will:

  • Become the subject matter expert on all matters SFC
  • Review code
  • Participate in upstream development
  • Resolve bugs
  • Draft design documents
  • Implement features
  • Lead and participate in design discussions
  • Attend conferences
  • Improve the project’s testing infrastructure
  • Own the project’s RPM packaging
  • Resolve customer issues

If you want to do open source, Red Hat is objectively where it’s at. We have an institutional culture of open source at all levels and this has a ripple effect on your day to day and your career at the company. You will work with a talented, autonomous, empowered and passionate team of people with a healthy work/life balance.

The ideal candidate is familiar with cloud, networking, Linux, Python, open source, or some combination of the above. You may work from home or from one of our offices listed here: redhat.com/en/jobs/locations.

Please email me CVs at assaf@redhat.com.

Standard
Uncategorized

“But I’m not a networking person!”

I hear that a lot, as if networking is this insurmountable mountain you could not possibly claim. Here’s how you become a networking person: You go to bed after a full day of work, overworked and frustrated with networking lingo. You have the craziest dream! It’s filled with streams of binary numbers, but you’re somehow able to convert them to ASCII instantly. You suddenly not only know that the first half of a MAC address designates the vendor of the NIC, you somehow also know that Qumranet‘s is 00:1A:4A. The seven layer model appears before you, every layer stacked upon the one before it like some perfectly formed Jenga tower from Cisco’s version of hell. Just as you’re breaking a sweat (This is getting too weird, you think), you wake up. Suddenly, Richard M. Stallman comes back from wherever it is he’s hanging around these days, networking cable in hand. He lays it on your shoulder, and then the other, like some sort of perverted Knighthood ceremony. You wake up from this dream within a dream and whisper: “I know networking.”

That’s how it usually works, anyway. Other people are not so lucky and have to learn the basics like they learn anything else: Read a book, then another one. Meet with some people, ask some questions. Practice it, learn it on the job, wing it. It’ll be alright.

Standard
OpenStack

New Neutron testing guidelines!

Yesterday we merged https://review.openstack.org/#/c/245984/ which adds content to the Neutron testing guidelines:

http://docs.openstack.org/developer/neutron/devref/development.environment.html#testing-neutron

The document details Neutron’s different testing infrastructures:

  • Unit
  • Functional
  • Fullstack (Integration testing with services deployed by the testing infra itself)
  • In-tree Tempest

The new documentation provides:

  • Advantages and use cases for each testing framework
  • Examples
  • Do’s and don’ts
  • Good and bad usage of mock
  • The anatomy of a good unit test

It’s short – I encourage developers to go through it. Reviewers may save time by linking to it when testing anti-patterns pop up.

Enjoy, I hope you’ll find it useful.

Standard
OpenStack

Neutron in-tree integration tests

It’s time for OpenStack projects to take ownership of their quality. Introducing in-tree, whitebox multinode simulated integration testing. A lot of work went in over the last few months by a lot of people to make it happen.

http://docs.openstack.org/developer/neutron/devref/fullstack_testing.html

We plan on adding integration tests for many of the more evolved Neutron features over the coming months.

Standard
DVR, OpenStack

Distributed Virtual Routing – Floating IPs

Where Am I?
Overview and East/West Traffic
SNAT
* Floating IPs

In The Good Old Days…

Legacy routers provide floating IPs connectivity by performing 1:1 NAT between the VM’s fixed IP and its floating IP inside the router namespace. Additionally, the L3 agent throws out a gratuitous ARP when it configures the floating IP on the router’s external device. This is done to advertise to the external network that the floating IP is reachable via the router’s external device’s MAC address. Floating IPs are configured as /32 prefixes on the router’s external device and so the router answers any ARP requests for these addresses. Legacy routers are of course scheduled only on a select subgroup of nodes known as network nodes.

Things Are About to Get Weird

In the DVR world, however, things are very different. This is going to get very complicated very fast so let’s understand how and why we got there. We could have kept things the way they are and configured floating IPs on the router’s ‘qg’ device. Or could we? Let’s consider that for a moment:

  • MAC addresses! Network engineers go to great lengths to minimize broadcast domains because networking devices have fairly modest upper bounds on their MAC tables. Most external networks use flat or VLAN networking: It is possible to subdivide them by using multiple external networks, or multiple subnets on a single external network, but let’s consider a single external network for the purpose of this discussion. With legacy routers you would ‘consume’ a MAC address on the external network per router. If we kept the existing model but distributed the routers, we would consume a MAC address for every (node, router) pair. This would quickly explode the size of the broadcast domain. Not good!
  • IP addresses! Legacy routers configure a routable address on their external devices. It’s not wasted by any means because it is used for SNAT traffic. With DVR, as we noticed in the previous blog post, we do the same. Do we actually need a dedicated router IP per compute node then? No, not really. Not for FIP NAT purposes. You might want one for troubleshooting purposes, but it’s not needed for NAT. Instead, it was chosen to allocate a dedicated IP address for every (node, external network) pair.

Where We Ended Up

Let’s jump ahead and see how everything is wired up (On compute nodes):

fip

When a floating IP is attached to a VM, the L3 agent creates a FIP namespace (If one does not already exist) for the external network that the FIP belongs to:

[stack@vpn-6-21 devstack (master=)]$ ip netns
fip-cef4f7b4-c344-4904-a847-a9960f58fb20
qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62

As we can see the fip namespace name is determined by the ID of the external network it represents:

[stack@vpn-6-21 devstack (master=)]$ neutron net-show public
...
| id                        | cef4f7b4-c344-4904-a847-a9960f58fb20 |
...

Every router on the compute node is hooked up to the FIP namespace via a veth pair (Quick reminder: A veth pair is a type of Linux networking device that is represented by a pair of devices. Whatever goes in on one end leaves via the other end. Each end of the pair may be configured with its own IP address. Veth pairs are often used to interconnect namespaces as each end of the pair may be put in a namespace of your choosing).

The ‘rfp’ or ‘router to FIP’ end of the pair resides in the router namespace:

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip address
    ...
3: rfp-ef25020f-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 16:91:f5:0b:34:50 brd ff:ff:ff:ff:ff:ff
    inet 169.254.31.28/31 scope global rfp-ef25020f-0
    inet 192.168.1.3/32 brd 192.168.1.3 scope global rfp-ef25020f-0
    ...
52: qr-369f59a5-2c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether fa:16:3e:33:6d:d7 brd ff:ff:ff:ff:ff:ff
    inet 20.0.0.1/24 brd 20.0.0.255 scope global qr-369f59a5-2c
    ...
53: qr-c2e43983-5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether fa:16:3e:df:74:6c brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/24 brd 10.0.0.255 scope global qr-c2e43983-5c
    ...

While the ‘fpr’ or ‘FIP to router’ end of the pair resides in the FIP namespace, along with the ‘fg’ / external device:

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec fip-cef4f7b4-c344-4904-a847-a9960f58fb20 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    ...
3: fpr-ef25020f-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 3e:d3:e7:34:f6:f3 brd ff:ff:ff:ff:ff:ff
    inet 169.254.31.29/31 scope global fpr-ef25020f-0
    ...
59: fg-b2b77eed-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether fa:16:3e:cc:98:c8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.23/24 brd 192.168.1.255 scope global fg-b2b77eed-1b
    ...

As you’ve surely noticed, the rfp and fpr are configured with link local IP addresses. Every time a router is configured on a compute node and hooked up to the FIP namespace in case a floating IP was configured on said router, a pair of free IP addresses is allocated out of a large pool of 169.254.x.y. These allocations are then persisted locally on the node’s disk in case the agent or the node decide to do the unthinkable and reboot.

Before we track a packet as it leaves a VM, let’s observe the routing rules in the router namespace:

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip rule
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default 
32768:	from 10.0.0.4 lookup 16 
167772161:	from 10.0.0.1/24 lookup 167772161 
335544321:	from 20.0.0.1/24 lookup 335544321

Huzzah, a new source routing rule! This time it’s a specific rule with our VM’s fixed IP address. You’ll notice that it has a lower (Better) priority than the generic rules that follow. We’ll expand on this in a moment.

Tracking a Packet

In the previous blog post we talked about classifying east/west and SNAT traffic and forwarding appropriately. Today we are joined by a third traffic class: Floating IP traffic. SNAT and floating IP traffic is differentiated by the ip rules shown above. Whenever a floating IP is configured by a L3 agent it adds a rule specific to that IP: It adds the fixed IP of the VM to the rules table, and a new routing table (In this example ’16’):

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip route show table 16
default via 169.254.31.29 dev rfp-ef25020f-0

If VM 10.0.0.4 (With floating IP 192.168.1.3) sends traffic destined to the outside world, it arrives in the local qrouter namespace and the ip rules are consulted just like in the SNAT example in the previous blog post. The main routing table doesn’t have a default route, and the ‘32768:    from 10.0.0.4 lookup 16′ rule is matched. The routing table known as ’16’ has a single entry, a default route with 169.254.31.29 as the next hop. The qrouter iptables NAT rules apply and the source IP is replaced with 192.168.1.3. The message is then forwarded with 169.254.31.29’s MAC address via the rfp device, landing squarely in the FIP namespace using its ‘fpr’ device. The FIP namespace routing table has a default route, and the packet leaves through the ‘fg’ device.

The opposing direction is similar, but there’s a catch. How does the outside world know where is the VM’s floating IP address: 192.168.1.3? In fact, how does the fip namespace know where it is? It has an IP address in that subnet, but the address itself is a hop away in the qrouter namespace. To solve both problems, proxy ARP is enabled on the ‘fg’ device in the FIP namespace. This means that the FIP namespace will answer ARP requests for IP addresses that reside on its own interfaces, as well as addresses it knows how to route to. To this end, every floating IP is configured with a route from the FIP namespace back to the router’s namespace as we can see below:

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec fip-cef4f7b4-c344-4904-a847-a9960f58fb20 ip route
default via 192.168.1.1 dev fg-b2b77eed-1b 
169.254.31.28/31 dev fpr-ef25020f-0  proto kernel  scope link  src 169.254.31.29 
192.168.1.0/24 dev fg-b2b77eed-1b  proto kernel  scope link  src 192.168.1.23 
192.168.1.3 via 169.254.31.28 dev fpr-ef25020f-0 

When the outside world wants to contact the VM’s floating IP, the FIP namespace will reply that 192.168.1.3 is available via the fg’s device MAC address (An awful lie, but a useful one… Such is the life of a proxy). The traffic will be forwarded to the machine, in through a NIC connected to br-ex and in to the FIP’s namespace ‘fg’ device. The FIP namespace will use its route to 192.168.1.3 and route it out its fpr veth device. The message will be received by the qrouter namespace: 192.168.1.3 is configured on its rfp device, its iptables rules will replace the packet’s destination IP with the VM’s fixed IP of 10.0.0.4 and off to the VM the message goes. To confuse this business even more, gratuitous ARPs are sent out just like with legacy routers. Here however, the floating IP is not actually configured on the ‘fg’ device. This is why it is configured temporarily right before the GARP is sent and removed right afterwards.

A Summary of Sorts

traffic_class

all

Standard