What is a BGP Confederation?

In network routing, BGP confederation is a method to use Border Gateway Protocol (BGP) to subdivide a single autonomous system (AS) into multiple internal sub-AS’s, yet still advertise as a single AS to external peers. This is done to reduce the number of entries in the iBGP routing table.  If you are familiar with breaking OSPF domains up into areas, BGP confederations are not that much different, at least from a conceptual view.

And, much like OSPF areas, confederations were born when routers had less CPU and less ram than they do in today’s modern networks. MPLS has superseded the need for confederations in many cases. I have seen organizations, who have different policies and different admins break up their larger networks into confederations.  This allows each group to go their own directions with routing policies and such.

if you want to read the RFC:https://tools.ietf.org/html/rfc5065

UBNT EDGEMAX 1.10.3 update route flushing

From UBNT:

New features:

Offloading – Add CLI commands to disable flow-table flushing in offloading engine when routing table changes:  set system offload ipv4 disable-flow-flushing-upon-fib-changes set system offload ipv6 disable-flow-flushing-upon-fib-changes
Discussed here

Prior to 1.10.3 firmware flow-table in offloading engine was always flushed when route was updated in linux routing table. Flow flushing ensured that offloading engine got routing updates instantly but it wasted a lot of CPU time and decreased performance if routing table was constantly updated for (instance in Full BGP, big OSPF or flapping PPPoE interface scenarios)

In 1.10.3 firmware by default disable-flow-flushing-upon-fib-changes is not set which means that flow table in offloading engine is always flushed upon routing table changes same way as it used to be in previous firmware.
If you have Full-BGP table or large OSPF network they you are advised to set disable-flow-flushing-upon-fib-changes this will ensure less CPU-load and increase max throughput.
Important note for multi-WAN environments – if nexthop interface of default-gateway changes and disable-flow-flushing-upon-fib-changes is set then it will take up to flow-lifetime seconds before all existing offloaded flows switch to new nexthop interface (up to 12 seconds by default).
  Offloading – Add CLI command to modify flow-lifetime in offloading engine (expressed in seconds): 
set system offload flow-lifetime 24Prior to 1.10.3 firmware flow-lifetime parameter was hardcoded and was not synchronized between different ER platforms: 12 seconds on ER-Lite/ER-Poe, 6 seconds on ER/ER-pro/ER-4/ER-6 and 3 seconds on ER-Infinity. 

In 1.10.3 firmware default value of flow-lifetime is set to 12 seconds for all ER platforms and now it can be modified. By modifying flow-lifetime parameter you control how much traffic skips from offloading engine into linux network stack.

If you increase flow-lifetime then:
 a) Offloaded IP flows will expire less frequently and less packets will be forwarded to linux
 b) CPU load will decrease and max throughput will increase
 c) if disable-flow-flushing-upon-fib-changes parameter is set then it will take more time for offloading engine to detect changes in routing table 
If you decrease flow-lifetime then:
 a) Offloaded IP flows will expire more frequently and more packets will be forwarded to linux
 b) CPU load will increase and max throughput will decrease
 c) if disable-flow-flushing-upon-fib-changes parameter is set then it will take less time for offloading engine to detect changes in routing table 
  Offloading – add CLI command to show flows in offloading engine: show ubnt offload flows Offloading – add CLI command to show offloading engine statistics: show ubnt offload statistics


Enhancements and bug fixes:

LDP – fixed regression in 1.10.0 when LDP configuration failed. Discussed here LoadBalancing – fixed regression in 1.10.1 when LoadBalancing failed to recover if WAN interface lost&restored link in 3 second interval. Discussed here DHCP – fixed bug when DHCP server configuration failed to commit with networks other than /8, /16, and /24. Discussed here TrafficControl – fixed regression in 1.10.0 when “command not found” output was printed when running “show traffic-control …” commands. Discussed here

Lab Network

I am starting an ongoing series involving a semi-static set of devices.  These will involve different tutorials on things such as OSPF, cambium configuration, vlans, and other topics.  Below is the general topology I will use for this lab network.  As things progress I will be able to swap different manufacturers and device models into this scenario without changing the overall topology.  We may add a device or two here and there, but overall this basic setup will remain the same.  This will allow you to see how different things are configured in the same environment without changing the overall scheme too much.

We will start with very basic steps.  How to login to the router, how to set an IP address, then we will move to setting up a wireless bridge between the two routers.  Once we have that done we will move onto setting up OSPF to enable dynamic routing.  After that the topics are open.  I have things like BGP planned, and some other things. If there is anything you would like to see please let me know.

The problem with peering from a logistics standpoint

Many ISPs run into this problem as part of their growing pains.  This scenario usually starts happening with their third or 4th peer.

Scenario.  ISP grows beyond the single connection they have.  This can be 10 meg, 100 meg, gig or whatever.  They start out looking for redundancy. The ISP brings in a second provider, usually at around the same bandwidth level.  This way the network has two pretty equal paths to go out.

A unique problem usually develops as the network grows to the point of peaking the capacity of both of these connections.  The ISP has to make a decision. Do they increase the capacity to just one provider? Most don’t have the budget to increase capacities to both providers. Now, if you increase one you are favouring one provider over another until the budget allows you to increase capacity on both. You are essentially in a state where you have to favor one provider in order to keep up capacity.  If you fail over to the smaller pipe things could be just as bad as being down.

This is where many ISPs learn the hard way that BGP is not load balancing. But what about padding, communities, local-pref, and all that jazz? We will get to that.  In the meantime, our ISP may have the opportunity to get to an Internet Exchange (IX) and offload things like streaming traffic.  Traffic returns to a little more balance because you essentially have a 3rd provider with the IX connection. But, they growing pains don’t stop there.

As ISP’s, especially WISPs, have more and more resources to deal with cutting down latency they start seeking out better-peered networks.  The next growing pain that becomes apparent is the networks with lots of high-end peers tend to charge more money.  In order for the ISP to buy bandwidth they usually have to do it in smaller quantities from these types of providers. This introduces the probably of a mismatched pipe size again with a twist. The twist is the more, and better peers a network has the more traffic is going to want to travel to that peer. So, the more expensive peer, which you are probably buying less of, now wants to handle more of your traffic.

So, the network geeks will bring up things like padding, communities, local-pref, and all the tricks BGP has.  But, at the end of the day, BGP is not load balancing.  You can *influence* traffic, but BGP does not allow you to say “I want 100 megs of traffic here, and 500 megs here.”  Keep in mind BGP deals with traffic to and from IP blocks, not the traffic itself.

So, how does the ISP solve this? Knowing about your upstream peers is the first thing.  BGP looking glasses, peer reports such as those from Hurricane Electric, and general news help keep you on top of things.  Things such as new peering points, acquisitions, and new data centers can influence an ISPs traffic.  If your equipment supports things such as netflow, sflow, and other tools you can begin to build a picture of your traffic and what ASNs it is going to. This is your first major step. Get tools to know what ASNs the traffic is going to   You can then take this data, and look at how your own peers are connected with these ASNs.  You will start to see things like provider A is poorly peered with ASN 2906.

Once you know who your peers are and have a good feel on their peering then you can influence your traffic.  If you know you don’t want to send traffic destined for ASN 2906 in or out provider A you can then start to implement AS padding and all the tricks we mentioned before.  But, you need the greater picture before you can do that.

One last note. Peering is dynamic.  You have to keep on top of the ecosystem as a whole.

Some Random Visio diagram

Below, We have some visio diagrams we have done for customers.

This first design is a customer mesh into a couple of different data centers. We are referring to this as a switch-centric design. This has been talked about in the forums and switch-centric seems like as good as any.

This next design is a netonix switch and a Baicells deployment.

Design for a customer

BGP local Pref and you

One of the bgp topics that comes up from time to time is what does “bgp local-pref” do for me? The short answer is it allows you to prefer which direction a traffic will flow to a given destination. How can this help you? Well before we start, remember the high number wins in local-pref.
Let’s assume you are an ISP. You have the following connections:
-You supply a BGP connection to a downstream client.
-You have a private peer setup with the local college
-You are hooked into a local internet exchange
-You have transport to another internet exchange in the next state over
-and you have some transit connections where you buy internet.

So how do we use BGP preference to help us out? We might apply the following rules to routes received from our various peers
Our downstream client we might set their local pref to 150
The college we may set them to 140
Preferred internet exchange peering: 130
Next state IX: 120
Transit ISPs: 100

Now these don’t make much sense by themselves, but they do when you take into account how BGP would make a decision if it has to choose between multiple paths. If it only has one path to a certain route then local-pref is not relevant.

Let’s say you have a customer on your ISP that is sending traffic to a server at a local college. Maybe they are a professor who is remoting into a server at the college to run experiments. There are probably multiple ways for that traffic to go. If the college is on the local Internet exchange you are a member of, that is one route, the next route would be your transit ISPs, and obviously your private peer with the college. So, in our example above the college, with a local pref of 140 wins out over the local exchange, wins out over the next state IX, and wins out over the Transit ISPs. We want it to go direct over the direct peer with the college. Mission accomplished.

local-pref is just one way to engineer your traffic to go out certain links. Keep in mind two things:
1.Higher number wins
2.local-pref only matters if there are multiple paths to the same destination.
3.Local-pref has to do with outbound path selection

Soft Reconfiguration inbound

Several people have been asking what soft Reconfiguration Inbound is on a BGP peer.

In the dark days of BGP you had to tear down the BGP session and do a full reestablishment in order to bring it up.  What soft reconfiguration does is copies of all routes received (this is why it is called inbound) are stored separately from the regular BGP table.   When a change is made the new change is applied to the stored copy of the BGP routes.

Disadvantage? This takes up memory because you have two files basically.

So how is this different than route refresh described in RFC 2918? This is a standard, with an RFC unlike Soft Reconfiguration inbound, which is a Cisco thing. Route refresh asks the peer to resend all its routes.

How I learned to love BGP communities, and so can you

BGP communities can be a powerful, but almost mystical thing.  If you aren’t familiar with communities start here at Wikipedia.  For the purpose of part one of this article we will talk about communities and how they can be utilized for traffic coming into your network. Part two of this article will talk about applying what you have classified to your peers.

So let’s jump into it.  Let’s start with XYZ ISP. They have the following BGP peers:

-Peer one is Typhoon Electric.  XYZ ISP buys an internet connection from Typhoon.
-Peer two is Basement3. XYZ ISP also buy an internet connection from Basement3
-Peer three is Mauler Automotive. XYZ ISP sells internet to Mauler Automotive.
-Peer four is HopOffACloud web hosting.  XYZ ISP and HopOffACloud are in the data center and have determined they exchange enough traffic amongst their ASN’s to justify a dedicated connection between them.
-Peer five is the local Internet exchange (IX) in the data center.

So now that we know who our peers are, we need to assign some communities and classify who goes in what community.  The Thing to keep in mind here, is communities are something you come up with. There are common numbers people use for communities, but there is no rule on what you have to number your communities as. So before we proceed we will need to also know what our own ASN is.  For XYZ we will say they were assigned AS64512. For those of you who are familiar with BGP, you will see this is a private ASN.  I just used this to lessen any confusion.  If you are following along at home replace 65412 with your own ASN.

So we will create four communities .

64512:100 = transit
64512:200 = peers
64512:300 = customers
64512:400 = my routes

Where did we create these? For now on paper.

So let’s break down each of these and how they apply to XYZ network. If you need some help with the terminology see this previous post.
64512:100 – Transit
Transit will apply to Typhoon Electric and Basement3.  These are companies you are buying internet transit from.

64512:200 – Peers
Peers apply to HopOffACloud and the IX. These are folks you are just exchanging your own and your customer’s routes with.

64512:300 – Customers
This applies to Mauler Automotive.  This is a customer buying Internet from you. They transit your network to get to the Internet.

64512:200 – Local
This applies to your own prefixes.  These are routes within your own network or this particular ASN.

Our next step is to take the incoming traffic and classify into one of these communities. Once we have it classified we can do stuff with it.

If we wanted to classify the Typhoon Electric traffic we would do the following in Mikrotik land:

/routing filter
add action=passthrough chain=TYPHOON-IN prefix= prefix-length=0-32 set-bgp-communities=64512:100 comment="Tag incoming prefixes with :100"

This would go at the top of your filter chain for the Typhoon Electric peer.  This simply applies 64512:100 to the prefixes learned from Typhoon.

In Cisco Land our configuration would look like this:

route-map Typhoon-in permit 20  
match ip address 102  
set community 64512:100

The above Cisco configuration creates a route map, matches a pre-existing access list named 102, and applies community 64512:100 to prefixes learned.

For Juniper you can add the following command to an incoming peer in policy-options:

set community Typhoon-in members 64512:100

Similar to the others you are applying this community to a policy.

So what have we done so far, we have taken the received prefixes from Typhoon Electric and applied community 64512:100 to it.  This simply puts a classifier on all traffic from that peer. We could modify the above example to classify traffic from our other peers based upon what community we want them tagged as.

In our next segment we will learn what we can do with these communities.

Quick and dirty DDoS mitigation for Mikrotik

Update: This article is not meant  to be a permanent solution.  It’s a way to stop the tidal wave of traffic you could be getting.  Many times it’s important to just get the customers up to some degree while you figure out the best course of action.  

Many of the Denial of Service (DDoS) attacks many folks see these days involve attacks coming from APNIC (Asia Pacific) IP addresses.  A trend is to open as many connections as possible and overwhelm the number of entries in the connection table. You are limited to 65,535 ports to be open.  Ports below 10000 are reserved ports, but anything above that can be used for client type connections.

 Now, Imagine you have a botnet with 10,000 computers all bearing their weight on your network.  Say you have a web-site someone doesn’t like.  If these 10,000 machines all send just 7 legitimate GET requests to your web-server you can bring, even a big router to a grinding halt.   Firewalls, due to the extra CPU they are exerting, are even more prone to these types of attacks.

So, how do you begin to mitigate this attack? By the time you are under attack you are in defensive mode.  Someone, or alot of someone’s, are at your door trying to huff and puff and blow your house down. You need to slow the tide.  One of the first things you can do is start refusing the traffic. A simple torch normally shows many of the attacking IPs, are from APNIC.  If this is the case, we enable a firewall rule that says if the IP is not sourced from the below “ARIN” address list go ahead and drop it.

add chain=forward comment="WebServer ACL" dst-address= src-address-list=!ARIN action=drop

The above rule says if our attacked host is being contacted by anything not on the “ARIN” list go ahead and drop it.

Make sure to paste this into /ip firewall address-list . These were copied off the ARIN web-site as of this writing. APNIC and other registries all have similar lists. Keep in mind, this won’t stop the traffic from coming to you, but will shield you some in order to have a somewhat functional network while you track down the issues.

Some people will say to blackhole the IP via a BGP blackhole server, but if you have production machines on the attacked host taking them offline for the entire world could be a problem.  This way, you are at least limiting who can talk to them.

add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN
add address= list=ARIN