Ready for part 2 of our Network Detective MPLS L3VPN Ride-Along? In part 1 we were presented with the “Case of the Failed L3VPN Ping”. We started, like any ping troubleshooting, on the IP subnets themselves – do they exist in the routing table?… are they being advertise? …. are the advertisements being received? In part 1 we stayed focused on the knowledge of the IP addresses and stayed in the BGP… interviewed witnesses… gathered facts and followed clues. We found a number of things not configured correctly and we fixed them.
And yet… while we fixed many things that needed to be fixed in the BGP…. we still couldn’t ping.
Ready to begin? Grab your Network Detective badge and something to take notes with. In this part 2 blog we will focus on the MPLS portion.
As we go along… continue to keep in your mind the Network Detective Mantra “Be Prepared, Find the Suspects, Question the Suspects, Improve”
First Pass: Pick a PE and Check the Basics
Deja vu eh? Weren’t we just here in the last blog? Didn’t we “do” the first pass already? Why are we back?
The “First Pass”, for me, is the “looking for clues“ phase. In the previous blog we gathered the facts, collected the clues, interviewed the witnesses, questions the suspects and fixed everything that was not correct in the BGP configs.
But we still can’t ping. So time to go back to the “looking for clues first pass“.
Remember: Knowledge is Key
Before we begin we need to remember that “knowledge is key”. While interviewing witnesses (e.g. “CLI show commands” on devices) we will get a crap ton of facts. We need to be able to identify which ones are also clues so we can follow them. Knowledge is what is key to this ability to tease out what facts are also clues to follow.
Bearing that in mind what knowledge do we need here to help us troubleshoot what is going on?
Since the device I picked in our first pass here is a PE, it would be beneficial to know what the PE is supposed to do with the packet after it finds the destination is in the VRF routing table.
Facts
- The VRF only exists in the PE routers
- The P router (R2) has the global table routing only and does NOT have 14.14.14.1 or 14.14.14.2
Knowledge
- The above facts are 100% normal in a MPLS L3VPN environment
What other knowledge do we need?
Needed Knowledge
- How does MPLS L3VPN work?
How is R1 to Format The Packet?
To me, in order to troubleshoot this, we need to know more about the logistics of how MPLS L3VPN works. How is R1 supposed to format the packet as it tosses it into the “MPLS L3VPN plumbing“?
I will give you the answer to this. In our specific environment R1 needs to send the ping out with 2 LABELS:
- Inner/VPN label
- Outer/TH label.
Where will these labels come from? What protocol(s)? We need to know so we can see if we are getting the labels we need to get thru the “plumbing”.
Let’s take these one at a time. The order I would troubleshoot these in is first to start with the Inner/VPN label.
Inner/VPN Label
To see whether or not R1 has knowledge of the Inner/VPN label to use for vrf Customer1 destination IP 14.14.14.3 we first kinda need knowledge as to how R1 is supposed to learn the label. I’ll go ahead and give you this one.
Inner/VPN label: This label will be advertised by R3 to R1 via BGP vpnv4 for the prefix 14.14.14.3.
R1#sh ip bgp vpnv4 all 14.14.14.3 BGP routing table entry for 14:14:14.14.14.3/32, version 7 Paths: (1 available, best #1, table Customer1) <snip> 10.100.100.3 (metric 3) (via default) from 10.100.100.3 (10.100.100.3) Origin incomplete, metric 0, localpref 100, valid, internal Extended Community: RT:14:14 mpls labels in/out nolabel/19 <- label advertised by R3: 19 rx pathid: 0, tx pathid: 0x0
The above translates to
-
- 10.100.10.3 (R3) – is advertising via BGP VPNv4 that if R1 has something in VRF Customer1 that wants to get to 14.14.14.3,
- R1 should use Inner/VPNv4 label 19 and send it to it, 10.100.100.3
Outer/TH Label
Why do we need a 2nd label? Why can’t we just toss label 19 on and send the packet off into the MPLS plumbing and over to R2?
Facts
- R3 told R1, via BGP vpnv4, to use label 19 as the label to use to get to 14.14.14.3
- The P router (R2) does not communicate with R3 via BGP vpnv4. Thus the P router (R2) does not have this information
- The P router (R2) is just part of the label switching plumbing between R1 and R3
- Label 19 won’t help in the plumbing until it gets to the device that advertised it – 10.100.100.3 (R3)
- Label 19 needs “transportation assistance” to get to 10.100.100.3 (R3)
Enter (drumroll please)…… the Outer/TH (transport header) label. This is the outer label that will get pushed in front of label 19 to help it get to the IP address of the BGP vpnv4 peer (10.100.100.3 in our case) who said “use vpn label 19“.
Let’s run a command in R1 that will show us if it has 2 labels set and ready to go for 14.14.14.3 in vrf Customer1.
R1#sh ip cef vrf Customer1 14.14.14.3 255.255.255.255 detail
14.14.14.3/32, <snip>
recursive via 10.100.100.3 label 19 <-- VPN (inner) label
nexthop 10.1.2.2 Gig0/0/1 unusable: no label <-- No label
R1#
R1#sh mpls forwarding-table
Local Outgoing Prefix Outgoing Next Hop
Label Label or Tunnel Id interface
16 Pop Label 14.14.14.1/32[V] aggregate/Customer1
17 No Label 10.2.3.0/24 Gi0/0/1 10.1.2.2
18 No Label 10.100.100.3/32 Gi0/0/1 10.1.2.2
R1#
Second Pass: The Missing Outer/Transit Header Label
We have officially collected enough information to move out of the First Pass “collecting clues” to the Second Pass of diving deeper and following one specific line of inquiry – the missing outer/TH label.
In order to go looking for why we are missing it….. lol… might be nice to have the knowledge of where R1 is supposed to get that label from.
The answer is LDP – label distribution protocol.
So let’s see if R1 has any LDP neighbors.
R1#sh mpls ldp neighbor
R1#
Well that might be a problem!
The interface on R1 that connects to R2 is gig0/0/1. Let’s do a show run on that interface and see if mpls ip is enabled on the interface. If it is, let’s do a show mpls interface to see if we can gather any other clues.
R1#sh run int gig0/0/1
Building configuration…
Current configuration : 138 bytes
!
interface GigabitEthernet0/0/1
ip address 10.1.2.1 255.255.255.0
ip ospf 100 area 0.0.0.0
negotiation auto
cdp enable
mpls ip
end
R1#
R1#sh mpls interfaces
Interface IP Tunnel BGP Static Operational
GigabitEthernet0/0/1 Yes (ldp) No No No Yes
R1#
Facts
- R1 is properly configured on gig0/0/1
- R1 is attempting to bring up a MPLS LDP Neighbor with R2
Additional Facts
I am going to go ahead and give these to you since I would like us to stay in R1.
- R2 is also properly configured on its end on gig0/0/1
- R2 is also attempting to bring up a MPLS LDP neighbor with R1
Discovering Your LDP Interface and Neighbor Status
Let me share with you my absolute FAVORITE MPLS LDP neighbor troubleshooting command — show mpls ldp discovery
R1#sh mpls ldp discovery
Local LDP Identifier:
10.100.100.1:0
Discovery Sources:
Interfaces:
GigabitEthernet0/0/1 (ldp): xmit/recv
<snip>
R1#
Let’s take the output of this command in two pieces. Let’s deal with just what I’m showing above.
Facts:
- R1‘s LDP identifier is 10.100.100.1
- R1 is Transmitting (xmit) and Receiving (recv) LDP Hello Messages on gig0/0/1
So if R1 is transmitting and receiving LDP hello messages… then why do we not have a LDP neighbor between R1 and R2?
I find it common that without specific knowledge we tend to assume things. The assumption I see people making here is the assumption that LDP will neighbor up on the interface similarly to an IGP. Hellos are sent and then it just neighbors up like an IGP – neighboring up specific to that interface.
LDP Neighbors “TCP Up” Between LDP Identifiers
This actually isn’t what happens. What actually happens is during the discovery phase R1 will learn the LDP identifiers of all the devices on that interface (gig0/0/1) sending out LDP Hellos. In our environment this is just the LDP identifier for R2 as 10.100.100.2. After discovery phase becomes “Yo… let’s neighbor up a TCP session between our two LDP Identifiers”. But when R1 tries to “TCP up” with 10.100.100.2 … it can’t find 10.100.100.2 in the routing table. Hence the no route.
R1#sh mpls ldp discovery
Local LDP Identifier:
10.100.100.1:0
Discovery Sources:
Interfaces:
GigabitEthernet0/0/1 (ldp): xmit/recv
LDP Id: 10.100.100.2:0 no route
Why? Well it wouldn’t be fun if I hadn’t caused a problem! 🙂 R2 wasn’t configured to advertise it’s loopback0 – 10.100.100.2. Why was R2’s loopback used? Cause that is the default behavior if you don’t hard code a LDP Identifier.
*Random Side Note: Please don’t do this. Please configure with intent and don’t go with just the defaults — (Preventing the Crime & The 7Ps)
So we go into R2 and fix this and…. drumroll please… voila! LDP neighbor up and running between R1 and R2.
R1#sh mpls ldp discovery
Local LDP Identifier:
10.100.100.1:0
Discovery Sources:
Interfaces:
GigabitEthernet0/0/1 (ldp): xmit/recv
LDP Id: 10.100.100.2:0
R1#
R1#sh mpls ldp neighbor
Peer LDP Ident: 10.100.100.2:0; Local LDP Ident 10.100.100.1:0
TCP connection: 10.100.100.2.17572 - 10.100.100.1.646
State: Oper; Msgs sent/rcvd: 11436/11430; Downstream
Up time: 6d22h
LDP discovery sources:
GigabitEthernet0/0/1, Src IP addr: 10.1.2.2
Addresses bound to peer LDP Ident:
10.1.2.2 10.2.3.2 10.100.100.2
R1#
Let’s see if we have two labels now…..
R1#sh ip cef vrf Customer1 14.14.14.3 255.255.255.255 detail
14.14.14.3/32, <snip>
recursive via 10.100.100.3 label 19 <-- VPN (inner) label
nexthop 10.1.2.2 Gig0/0/1 label 17 <-- TH (outer) label
R1#
Woot! 2 labels!
Can we ping?
R1#ping vrf Customer1 14.14.14.3 source 14.14.14.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 14.14.14.3, timeout is 2 seconds:
Packet sent with a source address of 14.14.14.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
R1#
Woot woot! The Case of the Failed L3VPN Ping is finally solved! Great job!
Categories: Fun in the Lab, MPLS, Network Detective, Troubleshooting
I have been searching for my issue since ages, I’m new to MPLS and I don’t know many of these troubelshooting commands that you have used here, nor that I was aware that I should advertise the loopback interface of my core router. Now that It worked I think that I’m going to cry! I can’t believe it finally worked!!
Thank you thank you thank you.
VERY VERY glad to help! 🙂