micemn-01.micemn.net route reflector at 206.108.255.1
We've noticed this being unresponsive several times this evening. I'm into the machine, but it is behaving badly, and from past experience, (and not direct evidence since it is remote and I can't get the relavent logs), it feels like a hard drive failing on this system. Ie. it may run some of its process out of memory for some functions, but other functions are probably locking up and not behaving correctly. Even RAIDed as it is. Probably the best course of action is to bring out a fresh drive to it and swap whatever drive is failing and let the mirror rebuild. I think we've got some of the 80GB WD800AAJS that is part of the mirror on the other route reflector around somewhere... -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
I've also seen issues with failing drives not flagging because of a buggy firmware version on the raid controller or drive at least a few times on Dell servers around that vintage. On Mar 13, 2014 12:19 AM, "Doug McIntyre" <merlyn@iphouse.net> wrote:
We've noticed this being unresponsive several times this evening.
I'm into the machine, but it is behaving badly, and from past experience, (and not direct evidence since it is remote and I can't get the relavent logs), it feels like a hard drive failing on this system.
Ie. it may run some of its process out of memory for some functions, but other functions are probably locking up and not behaving correctly. Even RAIDed as it is.
Probably the best course of action is to bring out a fresh drive to it and swap whatever drive is failing and let the mirror rebuild. I think we've got some of the 80GB WD800AAJS that is part of the mirror on the other route reflector around somewhere...
-- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
I'm into the machine, but it is behaving badly, and from past experience, (and not direct evidence since it is remote and I can't get the relavent logs), it feels like a hard drive failing on this system.
This is an actual hardware motherboard failure. Probably only the 2nd I've seen in my life of using Dell hardware. Lots of errors around the PCI bus, even with no cards installed. Not drives, memory, DRAC card. Nothing left to even try. Nevin@Arcus has already volunteered another PE1950 chassis if this was the case. We just have to coordinate getting it and swapping it in. The 2nd route-reflector should just keep up with our needs until this hardware can be fixed up. -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On Mar 13, 2014, at 4:03 PM, Doug McIntyre <merlyn@iphouse.net> wrote:
Nevin@Arcus has already volunteered another PE1950 chassis if this was the case. We just have to coordinate getting it and swapping it in.
That Nevin; he rocks! I nominate him as MICEMN mascot for 2014. -- Mike Horwath, reachable via drechsau@Geeks.ORG ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
We probably have a couple 1950s lying around too if needed. Drop line if you need me to take a look. On Mar 13, 2014 4:03 PM, "Doug McIntyre" <merlyn@iphouse.net> wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
I'm into the machine, but it is behaving badly, and from past experience, (and not direct evidence since it is remote and I can't get the relavent logs), it feels like a hard drive failing on this system.
This is an actual hardware motherboard failure. Probably only the 2nd I've seen in my life of using Dell hardware.
Lots of errors around the PCI bus, even with no cards installed. Not drives, memory, DRAC card. Nothing left to even try.
Nevin@Arcus has already volunteered another PE1950 chassis if this was the case. We just have to coordinate getting it and swapping it in.
The 2nd route-reflector should just keep up with our needs until this hardware can be fixed up.
-- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
Have about 2 dozen of them that have been retired, and in IPHouse's data center so easy to get to Doug. -Nevin On Thursday, March 13, 2014 7:44pm, "Brady Kittel" <bkittel@GMAIL.COM> said:
We probably have a couple 1950s lying around too if needed. Drop line if you need me to take a look. On Mar 13, 2014 4:03 PM, "Doug McIntyre" <merlyn@iphouse.net> wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
I'm into the machine, but it is behaving badly, and from past experience, (and not direct evidence since it is remote and I can't get the relavent logs), it feels like a hard drive failing on this system.
This is an actual hardware motherboard failure. Probably only the 2nd I've seen in my life of using Dell hardware.
Lots of errors around the PCI bus, even with no cards installed. Not drives, memory, DRAC card. Nothing left to even try.
Nevin@Arcus has already volunteered another PE1950 chassis if this was the case. We just have to coordinate getting it and swapping it in.
The 2nd route-reflector should just keep up with our needs until this hardware can be fixed up.
-- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On Thu, Mar 13, 2014 at 04:03:03PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
This is an actual hardware motherboard failure.
I brought out the new machine this early evening, and it came up without a hitch. I see most people's BGP sessions up, we have 923 prefixes announced into -01 now, and things are looking good. This should complete out this repair now. -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
Thanks, Doug! On Friday, March 14, 2014 6:29pm, "Doug McIntyre" <merlyn@IPHOUSE.NET> said:
On Thu, Mar 13, 2014 at 04:03:03PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
This is an actual hardware motherboard failure.
I brought out the new machine this early evening, and it came up without a hitch.
I see most people's BGP sessions up, we have 923 prefixes announced into -01 now, and things are looking good.
This should complete out this repair now.
-- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
Doug, Thank you! You are a great asset & supporter of the exchange. You are appreciated! Mike On Mar 14, 2014 6:29 PM, "Doug McIntyre" <merlyn@iphouse.net> wrote:
On Thu, Mar 13, 2014 at 04:03:03PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
This is an actual hardware motherboard failure.
I brought out the new machine this early evening, and it came up without a hitch.
I see most people's BGP sessions up, we have 923 prefixes announced into -01 now, and things are looking good.
This should complete out this repair now.
-- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On Fri, Mar 14, 2014 at 06:29:44PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 04:03:03PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
This is an actual hardware motherboard failure.
I brought out the new machine this early evening, and it came up without a hitch.
Just an update on this, as I'm sure people have noticed that this machine died horribly over the weekend, without any remedy by remote reboot. Same sort of issues, with PCI bus errors, and potential disk failure log entries. I went and visited the machine before lunch, brought another good disk controller, and now neither controller card will work in the slot, and PCI errors on either controller. I can only assume that the original disk controller (which was moved over due to the existing setup requiring that type of controller) is poison and destroying PCIe slots and thus systems that it plugs into. So, I think we'll try yet again, and burn in a system for a few days first before hauling it back out to MICE again. -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
Doug, Thanks for the update, and the work on this. Glad I have a lot of spare 1950 era gear on the stack for re-use. -Nevin On Monday, March 17, 2014 3:10pm, "Doug McIntyre" <merlyn@IPHOUSE.NET> said:
On Fri, Mar 14, 2014 at 06:29:44PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 04:03:03PM -0500, Doug McIntyre wrote:
On Thu, Mar 13, 2014 at 12:19:08AM -0500, Doug McIntyre wrote:
We've noticed this being unresponsive several times this evening.
This is an actual hardware motherboard failure.
I brought out the new machine this early evening, and it came up without a hitch.
Just an update on this, as I'm sure people have noticed that this machine died horribly over the weekend, without any remedy by remote reboot.
Same sort of issues, with PCI bus errors, and potential disk failure log entries.
I went and visited the machine before lunch, brought another good disk controller, and now neither controller card will work in the slot, and PCI errors on either controller.
I can only assume that the original disk controller (which was moved over due to the existing setup requiring that type of controller) is poison and destroying PCIe slots and thus systems that it plugs into.
So, I think we'll try yet again, and burn in a system for a few days first before hauling it back out to MICE again.
-- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades
########################################################################
To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
After a longer than planned burn in, and after one fried motherboard, 3 fried SAS controllers, and one replaced PERC Battery pack (and two poison hard drives that must shunt lethal voltage back into the data bus), micemn-01 is back online in a brand new software load with all new hardware. This is running a newer OS and new BIRD, so it is also a test to make sure newer code runs correctly as well. Our session (ipHouse) is up and stable, and I see many others are Established back to their BGP sessions as well. -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
Thank you Dr. McIntyre! Mike Hemphill General Manager | Cologix, Inc. 511 11th Ave S, Suite 450 | Minneapolis, MN 55415 P: 1+612.333.1922 | M: 1+612.812.5242 mike.hemphill@cologix.com -----Original Message----- From: MICE Discuss [mailto:MICE-DISCUSS@LISTS.IPHOUSE.NET] On Behalf Of Doug McIntyre Sent: Friday, March 21, 2014 3:35 PM To: MICE-DISCUSS@LISTS.IPHOUSE.NET Subject: Re: [MICE-DISCUSS] micemn-01.micemn.net repair 2.0 complete After a longer than planned burn in, and after one fried motherboard, 3 fried SAS controllers, and one replaced PERC Battery pack (and two poison hard drives that must shunt lethal voltage back into the data bus), micemn-01 is back online in a brand new software load with all new hardware. This is running a newer OS and new BIRD, so it is also a test to make sure newer code runs correctly as well. Our session (ipHouse) is up and stable, and I see many others are Established back to their BGP sessions as well. -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1 ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On 03/21/2014 03:35 PM, Doug McIntyre wrote:
Our session (ipHouse) is up and stable, and I see many others are Established back to their BGP sessions as well.
IPv6 came up for us @ Mar 21 15:21:48 (centrla). IPv4 will not establish on .1, we're AS4150. I've disabled and re-enabled this peer, no love. bgp_recv: peer 206.108.255.1 (External AS 53679): received unexpected EOF -- Chris Wopat SupraNet Communications, Inc. chrisw@supranet.net (608) 836-0282 x302 ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On Mon, Mar 24, 2014 at 07:44:54AM -0500, Chris Wopat wrote:
IPv6 came up for us @ Mar 21 15:21:48 (centrla). IPv4 will not establish on .1, we're AS4150.
I've disabled and re-enabled this peer, no love.
bgp_recv: peer 206.108.255.1 (External AS 53679): received unexpected EOF
I'm sorry, a typo for you was inserted during the data reconstruction for the configs. This should be recitified now, and I see you are Established with the connection for 4150. -- Doug McIntyre <merlyn@iphouse.net> ~.~ ipHouse ~.~ Network Engineer/Provisioning/Jack of all Trades ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
On 03/24/2014 02:58 PM, Doug McIntyre wrote:
I'm sorry, a typo for you was inserted during the data reconstruction for the configs. This should be recitified now, and I see you are Established with the connection for 4150.
Confirming it looks good from my end now. Cheers, -- Chris Wopat SupraNet Communications, Inc. chrisw@supranet.net (608) 836-0282 x302 ######################################################################## To unsubscribe from the MICE-DISCUSS list, click the following link: http://lists.iphouse.net/cgi-bin/wa?SUBED1=MICE-DISCUSS&A=1
participants (6)
-
Brady Kittel
-
Chris Wopat
-
Doug McIntyre
-
Michael Horwath
-
Mike Hemphill
-
Nevin Lyne