I’ll also add, this is currently a draft with the IETF grow WG: https://tools.ietf.org/html/draft-ietf-grow-bgp-session-culling-01 (and one which I fully support) -- Andrew Hoyos hoyosa@gmail.com
On Apr 26, 2017, at 9:34 AM, Andrew Hoyos <hoyosa@gmail.com> wrote:
On Apr 26, 2017, at 9:19 AM, Mike Horwath <drechsau@Geeks.ORG> wrote:
On Wed, Apr 26, 2017 at 08:07:45AM -0500, Andrew Hoyos wrote:
I would suggest that perhaps we look into filtering BGP (tcp/179) with an ACL prior to maintenance start on those specific ports being moved. Many other IXs are doing this for maintenance as a way to gracefully take things down, and let bilateral and RS sessions time out without killing active traffic. As we've noticed, not all members being moved are bothering to shut down sessions prior, which causes impact to/from those members. (i.e.: https://ripe67.ripe.net/presentations/374-WH-IXPMaintReduce.pdf)
Don't even need ACLs.
Just take down the route servers for the 2 hour period.
Bilateral are unaffected and they can arrange things anyway with their peers.
I’d disagree. The maintenance currently taking place affects more than just the route servers. Plenty of people are doing bi-lateral peering on MICE, and that *IS* affected by maintenance events like these.
Adding an ACL to the port ensures graceful shutdown/end of traffic, rather than an abrupt drop and hold timer fun. I’d much rather that someone running the maintenance and in control of the ultimate link up/down events be the one deciding when things are starting/ending and re-enabling traffic gracefully.
Adding another step to the process creates more complications as well, and another point of failure if you screw up along the way.
Disagree, adding an ACL to a port is pretty trivial. Add (pre-existing) ACL to port 10 minutes before maintenance starts. Remove when complete. Script up into copy/paste thing with port numbers for bonus points and less changes of failure.
Clean shutdown of bird is easier, quicker, and will for sure make the multilateral peering not be further affected by bouncing repeatedly.
Yes, great for MLPA, but not for bilateral.
Lastly, In this *specific* case, this presents issues with other members ports who are *NOT* affected by the maintenance and a loss of traffic for them if they are doing MLPA. Why break everyone and cause a total route server outage, when it’s not necessary at all? Yesterday’s maintenance only affected a portion of members. ACL’s on member ports would be the cleanest way to minimize outage duration for all members with the least impact to the IX as a whole.