MICE physical server 2 (and thus by extension route server 2) is again
having errors on its peering interface. We will be performing
maintenance tomorrow to address this. Since this only affects one of the
two route servers, participants should not be impacted.
Details:
Sometime tomorrow (exact time depending on day job schedules), Jeremy
and I will:
1. Apply the usual security updates to VMs and the host, since the host
is going to be rebooted anyway.
2. Convert server 2's port to a LAG. This will make future changes
easier, especially if this error problem continues.
This will involve a reboot of the server. The reboot is "necessary"
because Linux's netplan does not remove VLAN configuration when changes
are made (only adds it). Without removing the old
direct-on-the-physical-interface VLAN config, we could (would?) create a
loop. You may recall that I created a loop that blew up the exchange in
more-or-less that exact way when setting up the servers originally.
While I could remove the VLAN configuration manually, in this case, it's
easier and safer to simply reboot. That also ensures our running
configuration is fully consistent with the configuration files.
3. Replace the optic, also moving to the other port on the NIC.
The underlying failure manifests as a decrease in optical transmit power
(i.e. power received at the switch) over time. We have replaced the
optic previously, which fixes it, but then it fails again after a while.
We are open to suggestions about why this might be occurring.
One theory is that this is due to heat. We are going to try the second
port in case that makes a difference. We are also considering switching
to a direct attach cable, as perhaps that would produce less heat.
--
Richard Laager
Chief Manager, Director
Midwest Internet Cooperative Exchange LLC