Route Server 1 - BFD not working
Hi MICE, We are AS18451. On our BGP session to Route Server 1 (206.108.225.1 & 2001:504:27::d1af:0:1) i have lost the BFD session. My Juniper router shows the state as "Down". Is this intentional? An accident of the upgrades? Is my configuration wrong? Anyone else noticing this? My configuration is an interval of 1000 ms, multiplier 3. RS 2 is configured the same and BFD is up. Thanks, Jonathan -- Jonathan Stewart Network Engineer LES.NET - AS18451 Desk: 1-204-666-6191 Mobile: 1-204-990-2120 130 Portage Avenue E Winnipeg, MB R3C 0A1 CANADA
On 2023-06-21 14:06, Jonathan Stewart wrote:
On our BGP session to Route Server 1 (206.108.225.1 & 2001:504:27::d1af:0:1) i have lost the BFD session.
My Juniper router shows the state as "Down".
Is this intentional? An accident of the upgrades? Is my configuration wrong? Anyone else noticing this?
My configuration is an interval of 1000 ms, multiplier 3.
MICE is configured as 500 * 3 tcpdump shows your BFD control packets coming in, but bird is not responding. I'm seeing the same issue with 206.108.255.137 (Inteliquent AS 40160) on IPv4 and 2001:504:27:0:0:B7F8::1 (Compudyne, AS47096) on IPv6. Everyone else is consistent between the two route servers. This thread from 2020 talks about someone having BFD issues after a reboot, which they fixed by restarting BFD in BIRD. That might be something to try if we run out of other ideas: https://www.mail-archive.com/bird-users@network.cz/msg05382.html Before I try that, have you tried flapping your BGP session? -- Richard
Jonathan, This March I reported to peering@ a similar problem to RR1; I’m not sure when my problem started, I only noticed it while doing CLI jockey work on my router. I have had success removing and adding the BFD session on my Juniper. Over the last three months I saw one recurrence to RR1 but it was over v4 and v6; same solution for me. I didn’t spend time debugging JunOS further [myself or with jtac]. My side is a Juniper MX running 20.4. I have since added BFD session monitoring so I'm not reliant on syslogs. -Michael From: MICE Discuss <MICE-DISCUSS@LISTS.IPHOUSE.NET> On Behalf Of Richard Laager Sent: Wednesday, June 21, 2023 6:54 PM To: MICE-DISCUSS@LISTS.IPHOUSE.NET Subject: Re: [MICE-DISCUSS] Route Server 1 - BFD not working On 2023-06-21 14:06, Jonathan Stewart wrote: On our BGP session to Route Server 1 (206.108.225.1 & 2001:504:27::d1af:0:1) i have lost the BFD session. My Juniper router shows the state as "Down". Is this intentional? An accident of the upgrades? Is my configuration wrong? Anyone else noticing this? My configuration is an interval of 1000 ms, multiplier 3. MICE is configured as 500 * 3 tcpdump shows your BFD control packets coming in, but bird is not responding. I'm seeing the same issue with 206.108.255.137 (Inteliquent AS 40160) on IPv4 and 2001:504:27:0:0:B7F8::1 (Compudyne, AS47096) on IPv6. Everyone else is consistent between the two route servers. This thread from 2020 talks about someone having BFD issues after a reboot, which they fixed by restarting BFD in BIRD. That might be something to try if we run out of other ideas: https://www.mail-archive.com/bird-users@network.cz/msg05382.html Before I try that, have you tried flapping your BGP session? -- Richard ________________________________ To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On 2023-06-22 08:39, Michael Hare wrote:
This March I reported to peering@ a similar problem to RR1
If this was happening in March, the "good" news is that this is not due to the new route servers (different OS, BIRD upgrade, new config, etc.). I looked at the BIRD release notes, and I'm not seeing anything about BFD specifically. There are plenty of mentions of generic bug fixes. I suppose I should take this to the BIRD mailing list. -- Richard
On Wed, 2023-06-21 6:54 p.m., Richard Laager wrote:
Before I try that, have you tried flapping your BGP session?
I had a chance to try this today. Disable the BGP (and thus also BFD) session, waited a minute, and re-enabled BGP & BFD. Came back as normal and looks fine now on both IPv4 and IPv6. Thanks for the assistance Richard. Though it's still not clear what the real cause of the problem was. Curious. Cheers, Jonathan -- Jonathan Stewart Network Engineer LES.NET - AS18451 Desk: 1-204-666-6191 Mobile: 1-204-990-2120 130 Portage Avenue E Winnipeg, MB R3C 0A1 CANADA
On Thu, Jun 22, 2023 at 02:40:14PM -0500, Jonathan Stewart wrote:
Disable the BGP (and thus also BFD) session, waited a minute, and re-enabled BGP & BFD.
Came back as normal and looks fine now on both IPv4 and IPv6.
Thanks for the assistance Richard.
Though it's still not clear what the real cause of the problem was.
Bugs from the router vs the route-server perhaps? I hate magic like this. Magic sucks -- Mike Horwath, reachable via drechsau@Geeks.ORG
Richard's debugging suggested it was BIRD not responding to valid BFD packets transmitted from our Juniper router. But almost all BFD sessions are up, so what's wrong with this specific one--or pair in this case? There was a fibre cut which disconnected AS18451 from MICE for 1.5 days--maybe at service restoration there was packet loss for the first few seconds, so BFD handshaking failed? The problem for diagnosis is reproduceability. Since a session reset fixed the problem, we can't reproduce the problem, and that makes diagnosis impossible without additional data. To be clear, I don't expect any further action from MICE, as the problem is resolved. I'm just discussing troubleshooting and the limitations of our knowledge about this problem. Cheers, Jonathan Stewart AS18451 On Thu, 2023-06-22 3:53 p.m., Mike Horwath wrote:
On Thu, Jun 22, 2023 at 02:40:14PM -0500, Jonathan Stewart wrote:
Disable the BGP (and thus also BFD) session, waited a minute, and re-enabled BGP & BFD.
Came back as normal and looks fine now on both IPv4 and IPv6.
Thanks for the assistance Richard.
Though it's still not clear what the real cause of the problem was. Bugs from the router vs the route-server perhaps?
I hate magic like this. Magic sucks
On 2023-06-23 10:46, Jonathan Stewart wrote:
Richard's debugging suggested it was BIRD not responding to valid BFD packets transmitted from our Juniper router.
Re: "valid". In looking at the packet capture again, I'm not sure how the discriminator values are supposed to look. So maybe that is related.
The problem for diagnosis is reproduceability. Since a session reset fixed the problem, we can't reproduce the problem, and that makes diagnosis impossible without additional data.
To be clear, I don't expect any further action from MICE, as the problem is resolved.
I sent a write-up to the bird-users mailing list. We'll see if anyone there has any ideas. -- Richard
participants (4)
-
Jonathan Stewart
-
Michael Hare
-
Mike Horwath
-
Richard Laager