icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT?

I presume I wasn't the only one that felt the arp/nd storm that began ~ 00:58:01 2022/03/18 GMT? Event stopped for us by 01:03:02. I don't have info about mac addrs but our peering device reported 20kpps of icmp neighbor discovery. -Michael [AS3128]

Your correct we did Sent from my Verizon, Samsung Galaxy smartphone -------- Original message -------- From: Michael Hare <000000097dab80c5-dmarc-request@LISTS.IPHOUSE.NET> Date: 3/17/22 8:22 PM (GMT-06:00) To: MICE-DISCUSS@LISTS.IPHOUSE.NET Subject: [MICE-DISCUSS] icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT? I presume I wasn't the only one that felt the arp/nd storm that began ~ 00:58:01 2022/03/18 GMT? Event stopped for us by 01:03:02. I don't have info about mac addrs but our peering device reported 20kpps of icmp neighbor discovery. -Michael [AS3128]

That was me. I believe I created a loop with the new servers. As soon as I realized, I yanked the cables. Sorry everyone! -- Richard

We are hands off as of a few minutes ago. I will provide a more detailed explanation of my mistakes later. -- Richard
On Mar 17, 2022, at 20:34, Richard Laager <rlaager@wiktel.com> wrote:
That was me. I believe I created a loop with the new servers.
As soon as I realized, I yanked the cables.
Sorry everyone!
-- Richard

Something still doesn't seem right. Really slow traffic via MICE. I just shut down our MICE sessions and the "weirdness" cleared up. Ryan Compudyne/Integris/AS47096 ________________________________ Ryan Nelson Network Services Senior Architect Office: 218-336-2220 ryan.nelson@integrisit.com Empowering people through technology. From: MICE Discuss <MICE-DISCUSS@LISTS.IPHOUSE.NET> on behalf of Richard Laager <rlaager@WIKTEL.COM> Sent: Thursday, March 17, 2022 9:09 PM To: MICE-DISCUSS@LISTS.IPHOUSE.NET <MICE-DISCUSS@LISTS.IPHOUSE.NET> Subject: Re: [MICE-DISCUSS] icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT? EXTERNAL MESSAGE We are hands off as of a few minutes ago. I will provide a more detailed explanation of my mistakes later. -- Richard
On Mar 17, 2022, at 20:34, Richard Laager <rlaager@wiktel.com> wrote:
That was me. I believe I created a loop with the new servers.
As soon as I realized, I yanked the cables.
Sorry everyone!
-- Richard

FWIW, traffic numbers overall look good. I called Jeremy to get his thoughts. He hasn’t seen anything concerning outside of the initial issue. Out of an abundance of caution, he is shutting down the ports facing the new servers, cutting them (the thing we changed today) off completely. They are not in production use, so there is no reason not to take this step. You might want to do some ping testing to various peers interface IPs and/or check any Juniper-specific storm control stuff. -- Richard
On Mar 17, 2022, at 21:37, Ryan Nelson <ryan.nelson@integrisit.com> wrote:
Something still doesn't seem right. Really slow traffic via MICE. I just shut down our MICE sessions and the "weirdness" cleared up.
Ryan Compudyne/Integris/AS47096 Ryan Nelson Network Services Senior Architect Office: 218-336-2220 ryan.nelson@integrisit.com
Empowering people through technology. From: MICE Discuss <MICE-DISCUSS@LISTS.IPHOUSE.NET> on behalf of Richard Laager <rlaager@WIKTEL.COM> Sent: Thursday, March 17, 2022 9:09 PM To: MICE-DISCUSS@LISTS.IPHOUSE.NET <MICE-DISCUSS@LISTS.IPHOUSE.NET> Subject: Re: [MICE-DISCUSS] icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT?
EXTERNAL MESSAGE
We are hands off as of a few minutes ago. I will provide a more detailed explanation of my mistakes later.
-- Richard
On Mar 17, 2022, at 20:34, Richard Laager <rlaager@wiktel.com> wrote:
That was me. I believe I created a loop with the new servers.
As soon as I realized, I yanked the cables.
Sorry everyone!
-- Richard
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1

On 3/17/22 21:09, Richard Laager wrote:
I will provide a more detailed explanation of my mistakes later.
The existing servers are _only_ route servers. The new servers will be virtualization hosts running multiple VMs. The route servers will be running on there, but so will other things: IXP Manager, something to answer ARP for the blackholing we want to setup, Cacti will be brought in house, etc. Therefore, the host has bridging configured. We are tentatively planning to setup a quarantine VLAN as some other exchanges have done. This would also have quarantine route servers on it, which would behave normally except they would not advertise any routes out. This quarantine VLAN obviously needs to exist on the exchange switches because participant ports would be put into it. Because there will be multiple VLANs, the plan was to use VLAN tagging between the Arista and the new physical servers. The MICE VLAN is setup as VLAN 1. Before you say it... I don't think anyone likes that. Jeremy and I are of the opinion that this should change at some point in the future, but it doesn't seem worth doing until we are taking down the fabric anyway (e.g. a reboot of the Arista). In light of all that, our plan was (and in hindsight, I believe this to be mistake #1) to configure the main MICE VLAN as untagged, with the idea of adding the quarantine VLAN as tagged in the future. Cameron and I felt it was important to confirm the network was working to the MICE fabric before leaving Minneapolis, since we both live so far away. In hindsight, this was a good idea; had I made the same configuration mistake while working remote, it would have been slower to fix (at a minimum). The new systems are running Ubuntu 22.04 LTS which is increasingly frozen by the day [1] and will have a final release on April 21. So by the time we get everything configured, it will likely be released. This avoids us installing 20.04 LTS now and then wanting an upgrade immediately. Ubuntu (whether 22.04 LTS, or even 20.04 LTS) uses netplan to configure network interfaces. netplan uses an exclusively [2] declarative configuration model (unlike ifupdown, which is mostly declarative but some things can only be done with imperative commands). I'm actually a fan of netplan; it has made network configuration quite a bit nicer than ifupdown for me. I was not sure how to configure an untagged VLAN in netplan. I reviewed the documentation and was still unsure. I saw that the vlan "id" parameter was documented [3] as accepting a value of 0-4094. I figured I'd try 0 to see if that meant untagged. That did not work. I did a packet capture and found that 0 meant an explicit tag of 0. Having no more ideas, I gave up and removed the VLAN configuration, going back to just a straight interface. I figured I'd look at it more later. One of the big advantages of netplan is that it is (supposed to be) idempotent. That is, you configure it the way you want, run `netplan apply` and it makes it so. You don't (usually) need to explicitly manage the transition from the old state to the new state. Unfortunately, that is not true for bridging and/or VLANs, at least in some cases. In hindsight, for bridging, that largely makes sense. After all, you wouldn't want a `netplan apply` to remove the VMs host-side interfaces from the bridge group(s). But for VLANs, it seems like this is probably just something nobody implemented, as opposed to being desirable in theory. This was mistake #2 and the immediate cause of the loop. I ended up with a bridge interface that consisted of the physical interface (facing the MICE switch) and a VLAN interface (with an ID of 0) on that same physical interface. So as traffic came in, it was bridged back out the same physical interface, this time with an explicit VLAN 0 tag. Based on the fact that the explicit 0 tag didn't pass traffic normally in the first place, I don't think I was necessarily looping traffic at layer 3. But broadcast traffic would have been looped back to the switch, with the source MAC staying the same. This would obviously mess up the switch's MAC table. I believe that was the immediate cause of the issue. I noticed that things were not behaving quite right (some packet loss). I started a packet capture. When that was very quickly 5 GB rather than some tiny amount, I knew I had made a serious error that likely created a loop. I reflexively confirmed this with a "brctl show bridge", issued a reboot (because why not), and immediate jumped up and unplugged the cables from the back of the servers. When the systems came back up, I confirmed the network configuration was now correct and reconnected the cables. Because the server was expected to have multiple VMs on it, we could not use `port-security maximum 1`. As a result, port-security was not configured. Jeremy and I never had an explicit conversation about this, which was certainly another mistake. Had port-security been enabled, this loo would have been arrested much faster. Additionally, while I mentioned the upcoming work at the UG meeting, not announcing it on MICE-DISCUSS was a mistake. I know we talked about this before. I just forgot to do it. On 3/17/22 21:53, Richard Laager wrote:
Out of an abundance of caution, he is shutting down the ports facing the new servers, cutting them (the thing we changed today) off completely.
The ports have been re-enabled. The ports are now configured with the MICE VLAN _tagged_, which plays well with Linux & netplan. The ports have `port-security maximum 6` configured. This gives us enough headroom to support two route servers (in the event of a hardware failure where both have to run on the same physical box temporarily), two quarantine route servers (same note), the ARP responder, and the host's MAC address (if it shows up somehow). [1] https://discourse.ubuntu.com/t/jammy-jellyfish-release-schedule/23906 [2] Backends, like networkd, can and do implement imperative hooks. But netplan itself is only declarative. And Ubuntu is adding, in multiple cases at my direct suggestion, additional declarative parameters to eliminate the need for hook scripts in many scenarios. [3] https://netplan.io/reference/#properties-for-device-type-vlans%3A -- Richard

We also saw it. We flapped with 2001:504:27::51cc:0:1 (Akamai) at the same time. Not sure they're related. Ryan Malek - Router12 Networks LLC Internet, Phone, and Hosted Services Ph. 641.420.7180 On 3/17/2022 8:29 PM, Jeff Wilde wrote:
Your correct we did
Sent from my Verizon, Samsung Galaxy smartphone
-------- Original message -------- From: Michael Hare <000000097dab80c5-dmarc-request@LISTS.IPHOUSE.NET> Date: 3/17/22 8:22 PM (GMT-06:00) To: MICE-DISCUSS@LISTS.IPHOUSE.NET Subject: [MICE-DISCUSS] icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT?
I presume I wasn't the only one that felt the arp/nd storm that began ~ 00:58:01 2022/03/18 GMT? Event stopped for us by 01:03:02. I don't have info about mac addrs but our peering device reported 20kpps of icmp neighbor discovery.
-Michael [AS3128]
------------------------------------------------------------------------
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1 <http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1>

Yes we say it and it reset a bunch of our BGP session on MICE. Our Arbor Sightling Netflow say the sources were 2001:504:27::d1af:0:241/128 fe80::8618:88ff:fea4:d301/128 e80::a66c:2aff:fe76:b400/128 Destin to; All routers ff02::1 All MLD Routers ff02::16 And then a solicited-node address of ff02::1:ff00:254 Don't know the source of that On Thu, Mar 17, 2022 at 8:22 PM Michael Hare < 000000097dab80c5-dmarc-request@lists.iphouse.net> wrote:
I presume I wasn't the only one that felt the arp/nd storm that began ~ 00:58:01 2022/03/18 GMT? Event stopped for us by 01:03:02. I don't have info about mac addrs but our peering device reported 20kpps of icmp neighbor discovery.
-Michael [AS3128]
-- =============================================== David Farmer Email:farmer@umn.edu Networking & Telecommunication Services Office of Information Technology University of Minnesota 2218 University Ave SE Phone: 612-626-0815 Minneapolis, MN 55414-3029 Cell: 612-812-9952 ===============================================

Seems to jive with some traffic shift on the Compudyne 511 remote switch: http://micelg.usinternet.com/cacti/graph_view.php?action=tree&tree_id=1&leaf... -- Chris Wopat Network Engineer, WiscNet wopat@wiscnet.net 608-210-3965

Dave, thanks for the cluebat about looking at netflow. I didn’t think that one through myself. I included the top talker we saw below. Richard, thanks for responding and letting us know what happened. Having been in similar situations myself, “it happens”. -Michael ===/========== ** nfdump -M /var/local/flows/live/core -T -r nfcapd.202203172000 -n 10 -s record/packets -A srcip,dstip -6 nfdump filter: router ip 143.235.32.110 and proto icmp6 Aggregated flows 116 Top 10 flows ordered by packets: Date first seen Duration Src IP Addr Dst IP Addr Packets Bytes bps Bpp Flows 2022-03-17 19:59:59.104 19.456 fe80::1a2a:d300:64dd:ed24<https://flows-1.uwsys.net/nfsen/nfsen.php#null> ff02::1:ff00:254<https://flows-1.uwsys.net/nfsen/nfsen.php#null> 9.4 M 714.9 M 294.0 M 76 2 From: MICE Discuss <MICE-DISCUSS@LISTS.IPHOUSE.NET> On Behalf Of David Farmer Sent: Thursday, March 17, 2022 8:34 PM To: MICE-DISCUSS@LISTS.IPHOUSE.NET Subject: Re: [MICE-DISCUSS] icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT? Yes we say it and it reset a bunch of our BGP session on MICE. Our Arbor Sightling Netflow say the sources were 2001:504:27::d1af:0:241/128 fe80::8618:88ff:fea4:d301/128 e80::a66c:2aff:fe76:b400/128 Destin to; All routers ff02::1 All MLD Routers ff02::16 And then a solicited-node address of ff02::1:ff00:254 Don't know the source of that On Thu, Mar 17, 2022 at 8:22 PM Michael Hare <000000097dab80c5-dmarc-request@lists.iphouse.net<mailto:000000097dab80c5-dmarc-request@lists.iphouse.net>> wrote: I presume I wasn't the only one that felt the arp/nd storm that began ~ 00:58:01 2022/03/18 GMT? Event stopped for us by 01:03:02. I don't have info about mac addrs but our peering device reported 20kpps of icmp neighbor discovery. -Michael [AS3128] -- =============================================== David Farmer Email:farmer@umn.edu<mailto:Email%3Afarmer@umn.edu> Networking & Telecommunication Services Office of Information Technology University of Minnesota 2218 University Ave SE Phone: 612-626-0815 Minneapolis, MN 55414-3029 Cell: 612-812-9952 =============================================== ________________________________ To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1

Michael, Actually, the credit for looking at Netflow goes to Colin. Richard, Thanks for getting that project moving forward, and as Michael said, it happens Everyone, Attached are the detailed reports for the 3 alerts we got in that timeframe last night if you want more details. On Fri, Mar 18, 2022 at 9:24 AM Michael Hare < 000000097dab80c5-dmarc-request@lists.iphouse.net> wrote:
Dave, thanks for the cluebat about looking at netflow. I didn’t think that one through myself. I included the top talker we saw below.
Richard, thanks for responding and letting us know what happened. Having been in similar situations myself, “it happens”.
-Michael
===/==========
** nfdump -M /var/local/flows/live/core -T -r nfcapd.202203172000 -n 10 -s record/packets -A srcip,dstip -6
nfdump filter:
router ip 143.235.32.110 and proto icmp6
Aggregated flows 116
Top 10 flows ordered by packets:
Date first seen Duration Src IP Addr Dst IP Addr Packets Bytes bps Bpp Flows
2022-03-17 19:59:59.104 19.456 fe80::1a2a:d300:64dd:ed24 <https://flows-1.uwsys.net/nfsen/nfsen.php#null> ff02::1:ff00:254 <https://flows-1.uwsys.net/nfsen/nfsen.php#null> 9.4 M 714.9 M 294.0 M 76 2
*From:* MICE Discuss <MICE-DISCUSS@LISTS.IPHOUSE.NET> * On Behalf Of *David Farmer *Sent:* Thursday, March 17, 2022 8:34 PM *To:* MICE-DISCUSS@LISTS.IPHOUSE.NET *Subject:* Re: [MICE-DISCUSS] icmp v6 nd storm ~ 00:58:01 2022/03/18 GMT?
Yes we say it and it reset a bunch of our BGP session on MICE.
Our Arbor Sightling Netflow say the sources were
2001:504:27::d1af:0:241/128
fe80::8618:88ff:fea4:d301/128
e80::a66c:2aff:fe76:b400/128
Destin to;
All routers ff02::1
All MLD Routers ff02::16
And then a solicited-node address of
ff02::1:ff00:254
Don't know the source of that
On Thu, Mar 17, 2022 at 8:22 PM Michael Hare < 000000097dab80c5-dmarc-request@lists.iphouse.net> wrote:
I presume I wasn't the only one that felt the arp/nd storm that began ~ 00:58:01 2022/03/18 GMT? Event stopped for us by 01:03:02. I don't have info about mac addrs but our peering device reported 20kpps of icmp neighbor discovery.
-Michael [AS3128]
--
=============================================== David Farmer Email:farmer@umn.edu Networking & Telecommunication Services Office of Information Technology University of Minnesota 2218 University Ave SE Phone: 612-626-0815 Minneapolis, MN 55414-3029 Cell: 612-812-9952 ===============================================
------------------------------
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
------------------------------
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
-- =============================================== David Farmer Email:farmer@umn.edu Networking & Telecommunication Services Office of Information Technology University of Minnesota 2218 University Ave SE Phone: 612-626-0815 Minneapolis, MN 55414-3029 Cell: 612-812-9952 ===============================================
participants (7)
-
Chris Wopat
-
David Farmer
-
Jeff Wilde
-
Michael Hare
-
Richard Laager
-
Ryan Malek
-
Ryan Nelson