Category Archives: Routing

Multiple next-hops in policy routing

Imagine the following topology with routers 1 to 5:

         / (2) \
(4) - (1) (5)
         \ (3) /

There is Policy-based routing active on router 1, and the possible end-to-end paths are two:
4-1-2-5
4-1-3-5

Let us play around with the next-hop settings of PBR. Imagine we have a rudimentary PBR setup with the following:

route-map test permit 10
match ip address 100
set ip next-hop 10.0.12.2 10.0.13.3

10.0.12.2 is router 2 and 10.0.12.3 is router 3. We really don’t care about the access lists or anything else, let’s see what happens when we have multiple hops defined:

R1#debug ip policy
Policy routing debugging is on
R1#
*Jul 17 12:28:42.387: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 44, FIB policy match
*Jul 17 12:28:42.387: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.12.2, len 44, FIB policy routed
*Jul 17 12:28:42.447: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 40, FIB policy match
*Jul 17 12:28:42.447: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.12.2, len 40, FIB policy routed
*Jul 17 12:28:42.459: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 49, FIB policy match
*Jul 17 12:28:42.459: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.12.2, len 49, FIB policy routed
*Jul 17 12:28:42.467: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 40, FIB policy match
*Jul 17 12:28:42.467: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.12.2, len 40, FIB policy routed

There, so it keeps using the first next-hop no matter what. I guess the only way to nudge it to use the other one is to shutdown the interface connected to R2. Look what happens after I shut down that interface:

 *Jul 17 12:32:09.551: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 44, FIB policy match
*Jul 17 12:32:09.551: CEF-IP-POLICY: fib for address 10.0.12.2 is with flag 257
*Jul 17 12:32:09.551: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.13.3, len 44, FIB policy routed
*Jul 17 12:32:09.611: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 40, FIB policy match
*Jul 17 12:32:09.611: CEF-IP-POLICY: fib for address 10.0.12.2 is with flag 257
*Jul 17 12:32:09.611: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.13.3, len 40, FIB policy routed
*Jul 17 12:32:09.619: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, len 49, FIB policy match
*Jul 17 12:32:09.619: CEF-IP-POLICY: fib for address 10.0.12.2 is with flag 257
*Jul 17 12:32:09.619: IP: s=10.0.14.4 (GigabitEthernet1/0), d=5.5.5.5, g=10.0.13.3, len 49, FIB policy routed

The process sees that the route towards its next-hop is marked as down (flag 257) in the CEF tables (or non-existent), and goes on towards the next one.

Conclusion: Multiple next-hops in PBR are used for redundancy, not load-sharing/balancing.

Playing around with OSPF

I did some fooling around with OSPF priorities and here’s what I learned:

If you have multiple routers on an Ethernet segment,  and all routers have priority 0 – there will be no DR, no BDR, and only 2way connectivity between all of them.

If you have just one router has with priority >0, then that router will be the DR, but there will still be no bdr, and the only full connection would be, obviously, between the DR and the others.

With P2P links, OSPF doesn’t care about priority, but does care about area mismatches:

An area mismatch, for example (r1:normal)-(r2:stub) would tear down neighborship due to the mismatch, and it’s possible it won’t even show up on the error logs. Fun fun 🙂

Routallica – WAN / Metallica – One

Original song courtesy of Metallica
No copyright infringement intended, just having fun

Routallica – WAN

I can’t connect to anything
Can’t tell if this is routing or bridging
Deep down inside I want to ping
This terrible access-list stops me

Now that the routes are passed to me
Network is convergin, but I cannot see
And there are not many neighbors here
Nothing is up but loopbacks

Hold my frames as I wait for STP
Oh please, broadcasts, don’t storm me

Back in the WAN it’s much too serial
Terrible speeds that I must feel
But I look forward to police
Police those excess TCP bursts

Weighted RED is throttling me
Just like a greedy TCP stream
Class-based shaping’s buffering me
Dropping tokens from the bucket

Hold my updates as I split horizons
Oh please, RIP, route me

Now the network’s gone, I’m just one
Oh redundant link, help me
Split my brain as my peer faces death
Oh please, VSS, help me

VLANs imprisoning me
No adjacency
Asymmetric routing
I cannot ping
I cannot trace
Trapped in my shell
Subnet my holding cell

Black holes have poisoned my routes
Taken my ping
Taken my peering
Taken my ARPs
Taken my V(e)RFs
Taken my pools
Left me with Frame Relaaay

Unequal Load-Balancing on Cisco IOS

I just wanted to share a neat trick that a fellow CCIE colleague showed me.

In case of being connected to two ISPs, there is a way of doing unequal load-balancing with the help of static routes. For example, ISP X provides you with 25Mbps, and ISP Y with 50Mbps – a 2:1 ratio.

In order to achieve any kind of load-balancing on the Cisco IOS, we need multiple entries in the routing table, pointing to the same specific destination. As we would like to load-balance our uplink traffic towards the internet, we would need multiple entries towards our default gateways.

We are all familiar with the concept that there can be only one default route for a specific gateway – you can’t have multiple routing entries pointing to the same default gateway. That means that if we have multiple ISPs and multiple default gateways, our load-balance ratio will always be 1:1, as there is just a single entry in the routing table for each default gateway.

However, we can install multiple routing entries for seemingly different default gateways. That way, we can fool the device and have the same default gateway listed multiple times in the routing table. Ok, it sounds confusing, but just take a look at the configuration and it’ll become clear.

ip route 10.0.1.1 255.255.255.255 192.168.1.2  #(ISP X)
ip route 10.0.2.1 255.255.255.255 172.16.1.2  #(ISP Y)
ip route 10.0.2.2 255.255.255.255 172.16.1.2  #(ISP Y)

ip route 0.0.0.0 0.0.0.0 10.0.1.1
ip route 0.0.0.0 0.0.0.0 10.0.2.1
ip route 0.0.0.0 0.0.0.0 10.0.2.2

First, we define static routes for a couple of fake default gateways. Those IPs do not exist, and will only be used for the current load-ballancing trick, so be careful when setting up those and try not to assign some IPs in use.

After that, we define these fake IPs as default gateways. Having in mind that the ratio of the link bandwidth is 2:1, we created two routes towards the faster ISP and a single route towards the slower ISP.

What happens is the IOS uses all three of these default gateways, because the destination is seemingly different during the first look up in the routing table. The second look up will reveal that the fake default gateway’s IP is reachable only by either ISP X or ISP Y’s next-hop router. This is quite the ingenious way of tricking the device into installing multiple entries in its routing table.

Default gateway on Cisco IOS

On Cisco hardware, or at least on most of the IOS family, there are three ways of specifying a default gateway. Let us look into that:

ip default-gateway
ip default-network
and ip route 0.0.0.0 0.0.0.0

The ip default-gateway command differs from the other two commands, as it should only be used when ip routing is disabled on the Cisco router, which means most probably never, unless in boot mode. In such case, you can use it to define a gateway to use TFTP to transfer a Cisco IOS image to the router. Apparently, the router does not have ip routing enabled in boot mode.

***

ip default-network

So, unless in boot mode, you should probably be using the ip default-network command. When you configure ip default-network the router considers routes to that network for installation as the gateway of last resort on the router.

For every network configured with ip default-network, if a router has a route to that network, that route is flagged as a candidate default route.

show ip route
Gateway of last resort is not set
161.44.0.0/24 is subnetted, 1 subnets
C 161.44.192.0 is directly connected, Ethernet0
131.108.0.0/24 is subnetted, 1 subnets
C 131.108.99.0 is directly connected, Serial0
S 198.10.1.0/24 [1/0] via 161.44.192.2

show ip route
ip default-network 198.10.1.0

 Gateway of last resort is 161.44.192.2 to network 198.10.1.0
161.44.0.0/24 is subnetted, 1 subnets
C 161.44.192.0 is directly connected, Ethernet0
131.108.0.0/24 is subnetted, 1 subnets
C 131.108.99.0 is directly connected, Serial0
S* 198.10.1.0/24 [1/0] via 161.44.192.2

The gateway of last resort is now set as 161.44.192.2. This result is independent of any routing protocol, as shown by the show ip protocols command at the bottom of the output. This can help you solve some tricky connectivity scenarios, as well as utilize two (or ever more) routing protocols for default gateway redundancy.

You can add another candidate default route by configuring another instance of ip default-network:

ip route 171.70.24.0 255.255.255.0 131.108.99.2
ip default-network 171.70.24.0
show ip route

Gateway of last resort is 161.44.192.2 to network 198.10.1.0
171.70.0.0/16 is variably subnetted, 2 subnets, 2 masks
S 171.70.0.0/16 [1/0] via 171.70.24.0
S 171.70.24.0/24 [1/0] via 131.108.99.2
161.44.0.0/24 is subnetted, 1 subnets
C 161.44.192.0 is directly connected, Ethernet0
131.108.0.0/24 is subnetted, 1 subnets
C 131.108.99.0 is directly connected, Serial0
S* 198.10.1.0/24 [1/0] via 161.44.192.2

However, changes did not take effect. That is because there is a potential pitfall with the ip default-network command – it is classful. Due to this, the command must be issued again, using the major net, in order to flag the candidate default route. Kind of like a recursive default-network path.

ip default-network 171.70.0.0
show ip route

Gateway of last resort is 171.70.24.0 to network 171.70.0.0
* 171.70.0.0/16 is variably subnetted, 2 subnets, 2 masks
S* 171.70.0.0/16 [1/0] via 171.70.24.0
S 171.70.24.0/24 [1/0] via 131.108.99.2
161.44.0.0/24 is subnetted, 1 subnets
C 161.44.192.0 is directly connected, Ethernet0
131.108.0.0/24 is subnetted, 1 subnets
C 131.108.99.0 is directly connected, Serial0
S* 198.10.1.0/24 [1/0] via 161.44.192.2

As the Cisco documentations describes this interesting ‘hack’, if the original static route had been to the major network, the extra step of configuring the default network twice would not have been necessary. As you can see, this may create some implications if your dynamic routing protocol is advertising networks with a subnet mask higher than the classless one. Do proceed with care, and lab and test any configuration change before deploying it in a production network. It is always easy to roll-back with a quick ip route 0.0.0.0, but if you’re troubleshooting from afar, never take the risk of cutting yourself off. Shall you have no other choice, scheduled reloading may help you in such case.

Let us test the fallback mechanism of the ip default-network command. If we are to remove a route to the particular default network, the router selects the other candidate default. Let’s try it:

no ip route 171.70.24.0 255.255.255.0 131.108.99.2
show ip route

Gateway of last resort is 161.44.192.2 to network 198.10.1.0
161.44.0.0/24 is subnetted, 1 subnets
C 161.44.192.0 is directly connected, Ethernet0
131.108.0.0/24 is subnetted, 1 subnets
C 131.108.99.0 is directly connected, Serial0
S* 198.10.1.0/24 [1/0] via 161.44.192.2

 

Of couse, there’s always the good old ip route command.

Creating a static route to network 0.0.0.0 0.0.0.0 is another way to set the default gateway on almost any layer 3 device.

Most network engineers think that a static route using the ip route command takes precedence over any other route, but there is an exception.

As stated by the Cisco documentation:
If you use both the ip default-network and ip route 0.0.0.0 0.0.0.0 commands to configure candidate default networks, and the network used by the ip default-network command is known statically, the network defined with the ip default-network command takes precedence and is chosen for the gateway of last resort. Otherwise if the network used by the ip default-network command is derived by a routing protocol, the ip route 0.0.0.0 0.0.0.0 command, which has a lower administrative distance, takes precedence and is chosen for the gateway of last resort. If you use multiple ip route 0.0.0.0 0.0.0.0 commands to configure a default route, traffic is load-balanced over the multiple routes.

So, to sum it up, if there is a default-network statement, and it points to a statically defined network, it overrides the ip route 0.0.0.0 0.0.0.0 command. Plain and simple.

Summary

When in doubt, always check the documentation first 🙂
The ip default-gateway command should only be used when ip routing is disabled on a Cisco router. In any other case, use either the ip default-network or ip route 0.0.0.0 0.0.0.0 commands to set the gateway. Just take care with the classfull behavior of the former command.

Traffic Shaping and Policing

Crash course in QoS

What is traffic shaping/policing? In a nutshell, policing is dropping packets when the traffic exceeds a certain speed threshold, while shaping is queuing the incoming traffic in order to send it at a lower rate. Naturally, shaping is applied to outbound traffic, while policing can be applied on both directions, although it is usually applied to the inbound traffic. The following is the general QoS terminology:

  • Tc – Time interval, over which the commited burst (Bc) can be sent
  • Bc – Commited burst, measured in bits. This is the amount of traffic to be sent each Tc.
  • Be – Excess burst in bits. This is the traffic sent above your Bc, and most of the time risks being dropped, due to being in excess
  • CIR – Commited information rate, in bits per second. This is you allowed speed from your Internet contract.
  • Shaping rate – This is the rate at which your device will be sending traffic, which may be equal to the CIR, or even a little bit greater (more on that – later)
  • Policing rate – The rate after which your ISP starts to drop your traffic, in order to control your speed (this may be bigger that the actual CIR)

The deal with Bc, Be and Tc is that if you have a 128kbps line, and the intervals are 10 in a second (that can be configured), each 10th of a second you send 12.8kb. If you don’t have anything to send one interval, you’ve wasted 12.8kb. So, to reclaim it, you could send 25.6kb the next interval, but now you’ve overused your allowance. That means that your Tc is 0.1, your Bc is 12.8, and when you reclaimed your lost bandwidth, your Be was 12.8kb.

As I mentioned, the time interval Tc can be configured. The time interval directly impacts your Be burst. Why would you modify your time interval, when you can burst all the traffic up as fast as you can, then just wait ‘in silence’ for the current time interval to end?

Consider you have a 32kbps serial line from your ISP. Which means that you can transfer 32 kilobits per second. However, what if the clock rate on your router is running with clock rate 64000? That means that the router is transmitting at the hardware speed of 64 kilobits per second. Does mean that we get twice the bandwidth allotted for free? No. Our device, as the DTE end of the line, cannot change the physical speed of transmission. Then how do we maintain the 32kbps speed? Simple – we transmit the most we can, and then wait. Since we can transmit 64kilobits per second, then we can transmit 32kilobits per half a second, and then wait another half of the second.

The VoIP guys now scream in terror “500ms latency?”. Yeah, it’s no good – we need the use of a shaper in such case.

Shapers

Using a traffic shaper (usually) means transmitting at a lower rate that receiving. There are a couple of gotchas to traffic shaping, mainly which traffic should you send first, and which one should wait in line, as well as the speed you are transmitting with. The first problem is resolved through queuing strategies.  The second – using careful planning of your shaping configuration. So let’s dive in!

We already established that if no shaping is used, our router will transmit at the physical clock rate as much as possible, and when your limit is reached (in our former case – at the half of a second), the ISP will drop police any other traffic for the rest of the interval (again, in our former case – for the rest of the half second). This 500ms latency is most of the time unacceptable, so we employ shaping. To assume a safe figure of many intervals in a single second (in order to minimize delay), Cisco routers have a predefined limit of the Bc value. How does the Bc affect your Tc? To calculate your Tc time interval, use the following formula

Tc = (Bc / CIR) x 1000

By default, Cisco routers will use a value of 8000 bits for Bc if the interface bandwidth rate <= 320kbps; and calculates the Tc using the upper formula (that’s why it is important to set up your bandwidth [speed] in the interface view). If your line is > 320kbps, your Tc will be 25ms fixed, and your Bc will equal = ( shaping rate * Tc ).

This setup ensures that delays are kept to a minimum, even with the default settings. Of course, you can tune your Bc, Be, CIR(using the bandwidth command), and by extension of the former values – the Tc (which cannot be directly modified).

Policing

Depending on the traffic contract with your ISP, your ISP may police your traffic with a Single rate or Dual rate policer. Single rate traffic contract usually defines an average speed, which the contractor guarantees. However, taking into account the “burstyness” of the traffic on packet-switched networks, and oversubscriptions, the contractor may decide to offer a Dual rate scenario. With the latter option, the contractor guarantees a minimum speed, and also provides a higher one, which is non-guaranteed.

One should also know that the policers are categorized into two groups: two-color and three-color. What that means is that the two-colored policer distinguishes traffic within the CIR, and traffic above it; while the three-color policer has the notions of two kinds of exceeding traffic – regular exceeding traffic, and extremely exceeding traffic (violating traffic). We can draw parallels with the Dual rate policer here – the CIR is the minimum speed guaranteed, which is the regular traffic speed for the policer. The non-guaranteed traffic is the regular exceeding traffic to the policer, and when traffic exceeds the average non-guaranteed limit, it is considered the violating traffic. Thus the policer has the notions of Conforming traffic, Exceeding traffic, and if three-colored, Violating traffic. As you can probably guess, the Dual-Rate policer can only be three-colored, as we have clearly defined minimum and average speeds. The way I like to differentiate between the single-rate and double-rate three-color policers is the following: it depends on the way we reserve bandwidth for the exceeding speeds. Let’s visualise it with some ASCII art! Here the higher rate “sits” on top of the minimum guaranteed rate.

Maximum "unsafe" utilization
|Be rate>===============
|Bc rate>

Maximum "safe" utilization
|Be rate>
|Bc rate>===============

No utilization
|Be rate>
|Bc rate>_______________

A peak in the midst of no utilization
|Be rate> /****\
|Bc rate>___/ *****\_____

A peak of violating traffic in the midst of no utilization
|Bv rate> /*\
|Be rate> /*** \
|Bc rate>___/ \_____

With a single-rate policer, whenever your traffic passes over the Bc rate, it will either get dropped, or get marked “eligible for discard”, which most of the time means that it will get dropped somewhere along the way if congestion occurs.

With each Tc interval, you gain the right to transmit Bc amount of traffic. If you transmit more than the Bc traffic, you get sanctioned.

With a double rate policer, whenever your traffic passes over the Bc rate, the Be rate is ‘utilized’. Each Tc interval you gain the right to transmit Bc+Be amounts of traffic. If your traffic is within the Bc rate, only the Bc ‘bucket’ is depleted. If you have stood a couple of intervals in silence, and then need to transmit, you could compensate for the previous ‘wasted’ time intervals, by filling in the Be bucket, thus getting a Bc+Be speed.

Summary

As one can see, the ISP limitations on speed are often rounded up to an interval of a second, which tends to get interesting to configure. Depending on what your goals are and which technology needs to be supported, an uplink traffic can be shaped in many ways. Fine tuning the shaping variables can either make wonders with your network, or make it as unresponsive as a dead cloud unicorn.

So, have you got anything peculiar to share on the QoS matter?