Forums

Resolved
0 votes
Have variable connectivity "dropouts" at one of mine customers locations. The drops only lasts for a few seconds, so email, www and other services works without interuption, but PPTP VPN drops, and clients must reconnect manually. Customer have not used VPN intensively until now, so this fault might have existed since server was setup just before this summer.

The best logs I have found to show this is syswatch-log, here an typical example from today:


Thu Oct 9 12:09:58 2014 info: system - heartbeat... ...
Thu Oct 9 12:12:00 2014 info: eth0 - ping check on server #1 failed - (ISP gateway IP) ...
Thu Oct 9 12:12:05 2014 info: eth0 - ping check on server #2 failed - 69.90.141.72 ...
Thu Oct 9 12:12:05 2014 warn: eth0 - connection warning ...
Thu Oct 9 12:12:15 2014 info: eth0 - ping check on server #1 passed - (ISP gateway IP) ...
Thu Oct 9 12:19:15 2014 info: system - heartbeat... ...


It can be hours/days between each time it drops, at best, but sometimes it is 5-20 minutes between drops. Drops has also occured while there is no specific user-activity like VPN..

COS is on a HP DL380P with HP Ethernet 1Gb 4-port 331FLR Adapter.
Eth0 is for WAN. IP is set manually for eth0.
WAN is connected to a fibre-line via a Cisco 24 port switch (don't have modelnumber at hand), who is VLAN-ed by the ISP, we are the only customer on this switch. 7 metre of good quality UTP Cat6.

ISP has checked their logs, nothing logged since June 2014. COS reports alot of drops in the syswatch-logs who is there.

I don't know wether this is related to hardware/drivers, or COS, but I suspect HW/drivers.
I have not been at the location since we discovered this fault, I will of course try to switch to another port on the HP 4-port card and replace cable as soon as I get to the location (tomorrow hopefully).

Is there any good methods to find out where the fault is? I have located the drivers page at HP.com, but I am not experienced enough to dare to try this without help.

There exists a "Premium subscription" support, so I consider to open a ticket, but if this is possible to fix by myself... :-)
Thursday, October 09 2014, 10:47 AM
Share this post:
Responses (4)
  • Accepted Answer

    Thursday, October 09 2014, 08:17 PM - #Permalink
    Resolved
    0 votes
    Thanks alot for Your help Nick. This COS is a Pro, and uptime is too essential to dare to play with kernels etc ;-)
    Now, when I searched the scary interweb for possibly hints, I saw several complaining about unstability regarding ACK/ARP , virtual OS-es and these NICs. This made me suspisious that the ILO4 (HP Integrated Lights-Out 4......) might be the troublemaker, as I had configured to share eth0 with COS. I have never got this port-sharing to work as supposed, even if I followed HP's receipt. Anyway; I went to the customer, reconfigured ILO to use dedicated port, and voila&presto&all: Stable as it should be. Knock-knock.
    When I have better time, I will dig a bit more into this ILO-thing with shared NIC, as this opens for the possibility to access BIOS and maintaining-tools, even when "lightsout"; OS is not responding. Maybe not very important, since COS has a tendency to work as it should with no "lightsout" :woohoo:
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, October 09 2014, 05:05 PM - #Permalink
    Resolved
    0 votes
    3.124 is not the most recent. The most recent kmod driver on Elrepo is 3.133 so you could try Tim's driver which is that version. On HP's site they appear to have v3.136, but you'll need to find the sources for it so it can be recompiled. Having said that, if you are using the community version of ClearOS you can update your kernel (or perhaps just reboot if it is already installed). The latest kernel (2.6.32-431.23.3.v6.x86_64) has v3.132 of the tg3 driver which is pretty recent.

    I don't know which log files can be used for troubleshooting. Try all of them, especially messages and system.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, October 09 2014, 12:56 PM - #Permalink
    Resolved
    0 votes
    Ahh, forgot version: 64 bit: 2.6.32-358.23.2.v6.x86_64

    Seems like it is version 3.124, so if I understand correct that is as new as possible?

    Is there any other log-file I could look for signs/reasons, except for the syswatch?


    EDIT: Found that the HP 331FLR NIC has a Broadcom BCM5719 chip. If that clearify anything.. :-)
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, October 09 2014, 11:31 AM - #Permalink
    Resolved
    0 votes
    If you suspect drivers, can you check your current version with "modinfo tg3"? Also are you running 32bit or 64bit ClearOS (do a "uname -r" if you don't know)? Tim's latest kmod-tg3 driver is here for 64bit and here for 32bit. The 64bit one is the latest kmod one available. The 32bit one is a little older. On their site HP appear to have slightly later drivers (3.136) but I can't see their sources whch you would need to compile for a ClearOS compatible driver - you can't use the RHEL drivers directly.

    If you install new drivers it is best to reboot to have them take effect. Alternatively you could try restarting networking from the terminal (not remotely)
    The reply is currently minimized Show
Your Reply