The problem with peering from a logistics standpoint

Many ISPs run into this problem as part of their growing pains.  This scenario usually starts happening with their third or 4th peer.

Scenario.  ISP grows beyond the single connection they have.  This can be 10 meg, 100 meg, gig or whatever.  They start out looking for redundancy. The ISP brings in a second provider, usually at around the same bandwidth level.  This way the network has two pretty equal paths to go out.

A unique problem usually develops as the network grows to the point of peaking the capacity of both of these connections.  The ISP has to make a decision. Do they increase the capacity to just one provider? Most don’t have the budget to increase capacities to both providers. Now, if you increase one you are favouring one provider over another until the budget allows you to increase capacity on both. You are essentially in a state where you have to favor one provider in order to keep up capacity.  If you fail over to the smaller pipe things could be just as bad as being down.

This is where many ISPs learn the hard way that BGP is not load balancing. But what about padding, communities, local-pref, and all that jazz? We will get to that.  In the meantime, our ISP may have the opportunity to get to an Internet Exchange (IX) and offload things like streaming traffic.  Traffic returns to a little more balance because you essentially have a 3rd provider with the IX connection. But, they growing pains don’t stop there.

As ISP’s, especially WISPs, have more and more resources to deal with cutting down latency they start seeking out better-peered networks.  The next growing pain that becomes apparent is the networks with lots of high-end peers tend to charge more money.  In order for the ISP to buy bandwidth they usually have to do it in smaller quantities from these types of providers. This introduces the probably of a mismatched pipe size again with a twist. The twist is the more, and better peers a network has the more traffic is going to want to travel to that peer. So, the more expensive peer, which you are probably buying less of, now wants to handle more of your traffic.

So, the network geeks will bring up things like padding, communities, local-pref, and all the tricks BGP has.  But, at the end of the day, BGP is not load balancing.  You can *influence* traffic, but BGP does not allow you to say “I want 100 megs of traffic here, and 500 megs here.”  Keep in mind BGP deals with traffic to and from IP blocks, not the traffic itself.

So, how does the ISP solve this? Knowing about your upstream peers is the first thing.  BGP looking glasses, peer reports such as those from Hurricane Electric, and general news help keep you on top of things.  Things such as new peering points, acquisitions, and new data centers can influence an ISPs traffic.  If your equipment supports things such as netflow, sflow, and other tools you can begin to build a picture of your traffic and what ASNs it is going to. This is your first major step. Get tools to know what ASNs the traffic is going to   You can then take this data, and look at how your own peers are connected with these ASNs.  You will start to see things like provider A is poorly peered with ASN 2906.

Once you know who your peers are and have a good feel on their peering then you can influence your traffic.  If you know you don’t want to send traffic destined for ASN 2906 in or out provider A you can then start to implement AS padding and all the tricks we mentioned before.  But, you need the greater picture before you can do that.

One last note. Peering is dynamic.  You have to keep on top of the ecosystem as a whole.

Some Random Visio diagram

Below, We have some visio diagrams we have done for customers.

This first design is a customer mesh into a couple of different data centers. We are referring to this as a switch-centric design. This has been talked about in the forums and switch-centric seems like as good as any.

This next design is a netonix switch and a Baicells deployment.

Design for a customer

BGP local Pref and you

One of the bgp topics that comes up from time to time is what does “bgp local-pref” do for me? The short answer is it allows you to prefer which direction a traffic will flow to a given destination. How can this help you? Well before we start, remember the high number wins in local-pref.
Let’s assume you are an ISP. You have the following connections:
-You supply a BGP connection to a downstream client.
-You have a private peer setup with the local college
-You are hooked into a local internet exchange
-You have transport to another internet exchange in the next state over
-and you have some transit connections where you buy internet.

So how do we use BGP preference to help us out? We might apply the following rules to routes received from our various peers
Our downstream client we might set their local pref to 150
The college we may set them to 140
Preferred internet exchange peering: 130
Next state IX: 120
Transit ISPs: 100

Now these don’t make much sense by themselves, but they do when you take into account how BGP would make a decision if it has to choose between multiple paths. If it only has one path to a certain route then local-pref is not relevant.

Let’s say you have a customer on your ISP that is sending traffic to a server at a local college. Maybe they are a professor who is remoting into a server at the college to run experiments. There are probably multiple ways for that traffic to go. If the college is on the local Internet exchange you are a member of, that is one route, the next route would be your transit ISPs, and obviously your private peer with the college. So, in our example above the college, with a local pref of 140 wins out over the local exchange, wins out over the next state IX, and wins out over the Transit ISPs. We want it to go direct over the direct peer with the college. Mission accomplished.

local-pref is just one way to engineer your traffic to go out certain links. Keep in mind two things:
1.Higher number wins
2.local-pref only matters if there are multiple paths to the same destination.
3.Local-pref has to do with outbound path selection

Soft Reconfiguration inbound

Several people have been asking what soft Reconfiguration Inbound is on a BGP peer.

In the dark days of BGP you had to tear down the BGP session and do a full reestablishment in order to bring it up.  What soft reconfiguration does is copies of all routes received (this is why it is called inbound) are stored separately from the regular BGP table.   When a change is made the new change is applied to the stored copy of the BGP routes.

Disadvantage? This takes up memory because you have two files basically.

So how is this different than route refresh described in RFC 2918? This is a standard, with an RFC unlike Soft Reconfiguration inbound, which is a Cisco thing. Route refresh asks the peer to resend all its routes.

How I learned to love BGP communities, and so can you

BGP communities can be a powerful, but almost mystical thing.  If you aren’t familiar with communities start here at Wikipedia.  For the purpose of part one of this article we will talk about communities and how they can be utilized for traffic coming into your network. Part two of this article will talk about applying what you have classified to your peers.

So let’s jump into it.  Let’s start with XYZ ISP. They have the following BGP peers:

-Peer one is Typhoon Electric.  XYZ ISP buys an internet connection from Typhoon.
-Peer two is Basement3. XYZ ISP also buy an internet connection from Basement3
-Peer three is Mauler Automotive. XYZ ISP sells internet to Mauler Automotive.
-Peer four is HopOffACloud web hosting.  XYZ ISP and HopOffACloud are in the data center and have determined they exchange enough traffic amongst their ASN’s to justify a dedicated connection between them.
-Peer five is the local Internet exchange (IX) in the data center.

So now that we know who our peers are, we need to assign some communities and classify who goes in what community.  The Thing to keep in mind here, is communities are something you come up with. There are common numbers people use for communities, but there is no rule on what you have to number your communities as. So before we proceed we will need to also know what our own ASN is.  For XYZ we will say they were assigned AS64512. For those of you who are familiar with BGP, you will see this is a private ASN.  I just used this to lessen any confusion.  If you are following along at home replace 65412 with your own ASN.

So we will create four communities .

64512:100 = transit
64512:200 = peers
64512:300 = customers
64512:400 = my routes

Where did we create these? For now on paper.

So let’s break down each of these and how they apply to XYZ network. If you need some help with the terminology see this previous post.
64512:100 – Transit
Transit will apply to Typhoon Electric and Basement3.  These are companies you are buying internet transit from.

64512:200 – Peers
Peers apply to HopOffACloud and the IX. These are folks you are just exchanging your own and your customer’s routes with.

64512:300 – Customers
This applies to Mauler Automotive.  This is a customer buying Internet from you. They transit your network to get to the Internet.

64512:200 – Local
This applies to your own prefixes.  These are routes within your own network or this particular ASN.

Our next step is to take the incoming traffic and classify into one of these communities. Once we have it classified we can do stuff with it.

If we wanted to classify the Typhoon Electric traffic we would do the following in Mikrotik land:

/routing filter
add action=passthrough chain=TYPHOON-IN prefix=0.0.0.0/0 prefix-length=0-32 set-bgp-communities=64512:100 comment="Tag incoming prefixes with :100"

This would go at the top of your filter chain for the Typhoon Electric peer.  This simply applies 64512:100 to the prefixes learned from Typhoon.

In Cisco Land our configuration would look like this:

route-map Typhoon-in permit 20  
match ip address 102  
set community 64512:100

The above Cisco configuration creates a route map, matches a pre-existing access list named 102, and applies community 64512:100 to prefixes learned.

For Juniper you can add the following command to an incoming peer in policy-options:

set community Typhoon-in members 64512:100

Similar to the others you are applying this community to a policy.

So what have we done so far, we have taken the received prefixes from Typhoon Electric and applied community 64512:100 to it.  This simply puts a classifier on all traffic from that peer. We could modify the above example to classify traffic from our other peers based upon what community we want them tagged as.

In our next segment we will learn what we can do with these communities.

Quick and dirty DDoS mitigation for Mikrotik

Update: This article is not meant  to be a permanent solution.  It’s a way to stop the tidal wave of traffic you could be getting.  Many times it’s important to just get the customers up to some degree while you figure out the best course of action.  

Many of the Denial of Service (DDoS) attacks many folks see these days involve attacks coming from APNIC (Asia Pacific) IP addresses.  A trend is to open as many connections as possible and overwhelm the number of entries in the connection table. You are limited to 65,535 ports to be open.  Ports below 10000 are reserved ports, but anything above that can be used for client type connections.

 Now, Imagine you have a botnet with 10,000 computers all bearing their weight on your network.  Say you have a web-site someone doesn’t like.  If these 10,000 machines all send just 7 legitimate GET requests to your web-server you can bring, even a big router to a grinding halt.   Firewalls, due to the extra CPU they are exerting, are even more prone to these types of attacks.

So, how do you begin to mitigate this attack? By the time you are under attack you are in defensive mode.  Someone, or alot of someone’s, are at your door trying to huff and puff and blow your house down. You need to slow the tide.  One of the first things you can do is start refusing the traffic. A simple torch normally shows many of the attacking IPs, are from APNIC.  If this is the case, we enable a firewall rule that says if the IP is not sourced from the below “ARIN” address list go ahead and drop it.

add chain=forward comment="WebServer ACL" dst-address=1.2.3.4 src-address-list=!ARIN action=drop

The above rule says if our attacked host is being contacted by anything not on the “ARIN” list go ahead and drop it.

Make sure to paste this into /ip firewall address-list . These were copied off the ARIN web-site as of this writing. APNIC and other registries all have similar lists. Keep in mind, this won’t stop the traffic from coming to you, but will shield you some in order to have a somewhat functional network while you track down the issues.

Some people will say to blackhole the IP via a BGP blackhole server, but if you have production machines on the attacked host taking them offline for the entire world could be a problem.  This way, you are at least limiting who can talk to them.

add address=23.0.0.0/8 list=ARIN
add address=24.0.0.0/8 list=ARIN
add address=45.16.0.0/12 list=ARIN
add address=45.32.0.0/11 list=ARIN
add address=45.72.0.0/13 list=ARIN
add address=50.0.0.0/8 list=ARIN
add address=63.0.0.0/8 list=ARIN
add address=64.0.0.0/8 list=ARIN
add address=65.0.0.0/8 list=ARIN
add address=66.0.0.0/8 list=ARIN
add address=67.0.0.0/8 list=ARIN
add address=68.0.0.0/8 list=ARIN
add address=69.0.0.0/8 list=ARIN
add address=70.0.0.0/8 list=ARIN
add address=71.0.0.0/8 list=ARIN
add address=72.0.0.0/8 list=ARIN
add address=73.0.0.0/8 list=ARIN
add address=74.0.0.0/8 list=ARIN
add address=75.0.0.0/8 list=ARIN
add address=76.0.0.0/8 list=ARIN
add address=96.0.0.0/8 list=ARIN
add address=97.0.0.0/8 list=ARIN
add address=98.0.0.0/8 list=ARIN
add address=99.0.0.0/8 list=ARIN
add address=100.0.0.0/8 list=ARIN
add address=104.0.0.0/8 list=ARIN
add address=107.0.0.0/8 list=ARIN
add address=108.0.0.0/8 list=ARIN
add address=135.0.0.0/8 list=ARIN
add address=136.0.0.0/8 list=ARIN
add address=142.0.0.0/8 list=ARIN
add address=147.0.0.0/8 list=ARIN
add address=162.0.0.0/8 list=ARIN
add address=166.0.0.0/8 list=ARIN
add address=172.0.0.0/8 list=ARIN
add address=173.0.0.0/8 list=ARIN
add address=174.0.0.0/8 list=ARIN
add address=184.0.0.0/8 list=ARIN
add address=192.0.0.0/8 list=ARIN
add address=198.0.0.0/8 list=ARIN
add address=199.0.0.0/8 list=ARIN
add address=204.0.0.0/8 list=ARIN
add address=205.0.0.0/8 list=ARIN
add address=206.0.0.0/8 list=ARIN
add address=207.0.0.0/8 list=ARIN
add address=208.0.0.0/8 list=ARIN
add address=209.0.0.0/8 list=ARIN
add address=216.0.0.0/8 list=ARIN

IP space terms to know

When you are talking about the type of assigned Public IP space you have there are a couple of terms that are handy to know.

Provider assigned (PA) space. This is space assigned by your upstream provider. These “belong” to someone you are buying services from. If you wish to advertise these via your own ASN to other providers you need a Letter of Authority (LOA) from whom these IPs are assigned to.

Provider independent (PI) is space directly assigned to you from a registry such as ARIN, RIPE,etc. These addresses “belong ” to you. You have authority over these addresses to assign them out, as long as it meets the terms set by the registry.

 

 

RFC’s you need to know: RFC 2796 BGP Route Reflection

https://tools.ietf.org/html/rfc2796

Currently in the Internet, BGP deployments are configured such that all BGP speakers within a single AS must be fully meshed and any external routing information must be re-distributed to all other routers within that AS. For n BGP speakers within an AS that requires to maintain n*(n-1)/2 unique IBGP sessions. This "full mesh" requirement clearly does not scale when there are a large number of IBGP speakers each exchanging a large volume of routing information, as is common in many of todays internet networks.
This scaling problem has been well documented and a number of proposals have been made to alleviate this [2,3]. This document represents another alternative in alleviating the need for a "full mesh" and is known as "Route Reflection". This approach allows a BGP speaker (known as "Route Reflector") to advertise IBGP learned routes to certain IBGP peers. It represents a change in the commonly understood concept of IBGP, and the addition of two new optional transitive BGP attributes to prevent loops in routing updates.

How does BGP select which route?

BGP can be a complex and almost mystical protocol. For those of you who are trying to determine how BGP selects which route here is your guide. Before we get into it a couple of things to keep in mind. First, BGP is not a multipath routing protocol. This is different than what you may be used to with OSPF. BGP goes to great lengths to encure only one route is used. Secondly, there are some vendor specific rules which are applied. I will try to point those out as we go along.

1.The first test is if the next hop router is accessible.

2.If Synchronization is enabled, the router will ignore any iBGP routes which are not synced.

3.The third is Cisco specific. Cisco uses a weight attribute. The largest weight wins. Default weight is zero. Maximum weight is 65,535.

4.If the weights are the same, the highest local preference is chosen from LOCAL_PREF. It’s important to note that routers only receive this from iBGP.

5.Net up, the router checks to see if any of the possible routes were originated locally. The two main checks are either the network or aggregate commands. The network command wins if it is originated locally.

6.If two or more routes are still equal the router looks as AS_PATH. The router will prefer any iBGP routes. Outside of the AS BGP will prefer the shortest path.

7.BGP then moves on to the ORIGIN attribute. If the path lengths are the same, BGP selects IGP over EGP and EGP over INCOMPLETE routes.

8.BGP now looks at MED values. The lowest value is selected. Note, MED is only used if both routes are received from the same AS, or if always-compare-med has been enabled. Be careful with always-compare-meds as this can cause routing loops.

9.BGP will then prefer eBGP to iBGP routes. This is not the same as #5 above. Only external routes are looked at here.

10.Next IBGP costs are compared to the next hop routers. The closest one is selected.

11.Ages of routes are finally connected. This is kind of like choosing teams for dodgeball. The oldest route wins. The reason being is oldest routes are thought to be more stable.

12.And finally, if all else fails the router with the lowest router ID wins.

This is a quick low-down on how BGP “thinks” in order to determine routes. If anyone has some Cisco, Mikrotik, quagga, or other specific attributes please comment. I have reached out to Mikrotik and Ubiquiti specifically to see if this is in-line with their implementation of BGP.