1. The Mesh of Death Adversity

Topology and scenario

(Amadeus Alfa, Juliusz Chroboczek, Federico Capoano, Goran Mahovlić, Tomislava, Dario Stelitano, Riccardo Bloise)

../_images/1-the-mesh-of-death-adversity.svg

Our test network consists of 10 wifi routers all directly connected to each others with one exception: A and K are not directly connected. Each router operates simultaneously with two different frequencies, one in 2.4 GHz, and the other in 5 Ghz. To avoid a link between A and K, we put them in different rooms and diminished their transmission power, while other routers (B to J) were in a large hall.

In order to test the performances of this network, two hosts are connected with wires at the extremities of the network: the client to A, and the server to K. We use wires to avoid interfering with the network we were testing. Then, the client performs some tests, using two tools: ping, to measure the latency between the two hosts, and iperf, to stress out the network and measure the throughput.

In this scenario, we are comparing 5 actively maintained and deployed routing protocols: Babel, Batman-adv, BMX7, OLSRv1 and OLSRv2.

Note

  • Some links are not drawn to avoid unnecessary confusion,
  • Remark that the shortest path from A to K consists of 2 hops,
  • There is no router I to avoid a possible confusion with the number 1.

Problems

This topology looks like really simple, but we expect that the interferences generated by as many routers should be sufficient to make differences between protocols.

Indeed, as described in The Hidden Node Problem, a wifi adapter is either transmitting or receiving. From a routing point of view, this means that doing two hops on the same channel will divide the throughput by two, and with three hops by three, etc. Using multiple non-interfering channels can avoid this problem, or at least will limit it: the transit node of a two hop route with each hop on a different channel can receive on one channel packets that it simultaneously send on the other channel. Another problem with wifi is that the performances decrease quickly with the distance because of packet loss: it is often better to take a two hop route instead of a long one hop route, even if the two hops are on the same channel.

From a routing point of view, we see that there is a real challenge: choosing the right tradeoff between taking a few hops, dodging lossy links and varying the channels used.

Requirements

  • 10x Tp Link WDR4300 with OpenWRT
  • 2x laptops with real ethernet ports (no adapters)
  • 2x ethernet cables

Configuration

Note

All the configuration files for each router are available on github.

The binary of the firmware is also available.

Each node is a dual radio wireless router (TP-Link WDR4300), the most important facts related to the configuration are:

  • multi channel mesh (2 GHz and 5 GHz)
  • dual stack (IPv4 and IPv6)
  • protocols installed: Babel, Batman-adv, BMX7, OLSRv1 and OLSRv2
  • laptops were connected to the mesh with static routes on nodes A and K

Warning

By the end of the eight edition we came to the conclusion that having to set up static routes to plug laptops into the mesh was a mistake.

We also haven’t been able to run batman-adv with the same network configuration of the other routing protocols.

For these reasons Henning Rogge proposed a better configuration plan for the next edition (Battlemesh v9).

Test

(Henning Rogge, Thijs Van Veen)

Note

The test script is available on github, the relevant sections are test 1, 2 and 3.It has been carefully crafted such that the tests can be repeated easily.

The tests mainly consisted in generating traffic from the client connected to A to the server connected to K. The measurements were collected on the client.

3 different tests were performed:

  • reboot: measure ping RTT while the mesh is rebooted
  • ping: only measure ping RTT
  • ping + iperf: measure ping RTT and throughput of a 10 Mbit/s UDP Iperf stream running simultaneously

Note

Results

(Matthieu Boutier)

Graphs and raw data are provided for each test.

Note

The graphs were generated with the generate_graphs.sh script (requires the R programming language), available on github.

reboot

In the reboot experiment, we let the network run stable for some time, and then suddently reboot all routers simultaneously. The following graph show a quick overview of the whole experiment.

(How to read: lower is better)

../_images/reboot-rtt-normal-summary.svg

What interests us in this experiment is the small part after the reboot: the following graph represent the ECDF graph of the ping samples taken for 50s after the reboot. The x-axis is scaled to show only packets than less than 50ms: we see that all protocols are choosing fast routes, since in all cases, the RTT of the packets are below 50ms. In this particular example though, Babel, BMX7 and OLSRv1, with almost all packets being under 10ms, outperforms Batman-adv and OLSRv2, which “only” have 80% of the packets under 10ms.

(How to read: closer to left is better, learn more about how to read ECDF graphs)

../_images/reboot-rtt-ecdf-zoom.svg

Zooming at the normal graphs around time 150 gives us another precious informations: we see when the routing protocols begin to forward packets, which should reflect the convergence time of each protocol. Regarding this benchmark, we observe the following convergence time:

Babel OLSRv2 BMX7 OLSRv1 Batman-adv
151 155 159 163 182
+0 +4s +8s +12s +23s

(How to read: lower is better)

../_images/reboot-rtt-normal-zoom.svg

Note

Raw data for this test is available on github.

ping

In the ping experiment, we just measure the latency of the network with the ping tool, without any other perturbation. We expect an extremely stable network, with low RTT measurements and high fairness.

The following graph shows that for all protocols except Batman-adv, packets are routed pretty fairly, and have for 90% of them less than 5ms RTT, and for almost all of them less than 10ms. Packets routed by Batman-adv are not routed fairly: 50% are less than 4ms, 80% are less than 8ms, 90% are less than 10ms and almost all are less than 50ms.

(How to read: closer to left is better, learn more about how to read ECDF graphs)

../_images/ping-rtt-ecdf-zoom.svg

Looking the details shows that OLSRv1 and BMX7 are leading to the fairest and fastest RTT, behaving exceptionnaly well. They are closely followed by Babel, which has a slight fairness pathology: most of the packets (80%) are around 3.2ms, but around 10% are around 4ms, with a visible irregularity. Then comes OLSRv2, very fair, with packets around 4.5ms (+1ms).

The Babel irregularity can be explained with the following graph, giving only the Babel curve. We see that the packets having a higher RTT value are grouped in two points. This may happen because Babel hesitate with two paths, and sometimes switch to the wrong one: he then takes around 15s to decide that the other route was better, and stay much longer (70s minimum) on that better route.

Measured RTT in classic graph (Babel only):

(How to read: lower is better)

../_images/ping-rtt-normal-babel.svg

Note

Raw data for this test is available on github.

ping + iperf

In the ping + iperf experiment, we measure the latency of the network with the ping tool while pushing 10MB/s additionnal traffic from the client to the server. The graph below shows that OLSRv2 gives the fairest results for all packets: and has 95% of its packets are under 20ms, against 40ms for Babel and BMX7, 80ms for OLSRv1, and around 820ms for Batman-adv.

Measured RTT in ECDF graph:

(How to read: closer to left is better, learn more about how to read ECDF graphs)

../_images/pingiperf-rtt-ecdf-zoom.svg

Interestingly enough, for 75% of the packets, Babel is leading with RTT under 9ms, but doesn’t loose its fairness like the previous test, with a visible step: it’s merely progressive. OLSRv2, BMX7 and OLSRv1 gives RTT under 13ms, and Batman-adv under 65ms.

Measured RTT in classic graph:

(How to read: lower is better)

../_images/pingiperf-rtt-normal.svg

Finally, all protocols lead to the expected bitrate (10MB/s), as we see on the following graph.

Measured Bitrate:

(How to read: higher is better)

../_images/pingiperf-bitrate-normal.svg

Note

Raw data for this test is available on github.

Article written by Federico Capoano, Matthieu Boutier, Thijs van Veen.