Forums

Resolved
0 votes
My Intel driver is hanging, /var/logs/messages
Jul 23 10:42:29 Orion kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:#012 TDH <0>#012 TDT <7>#012 next_to_use <7>#012 next_to_clean <0>#012buffer_info[next_to_clean]:#012 time_stamp <10d1f7ea4>#012 next_to_watch <0>#012 jiffies <10d1facf0>#012 next_to_watch.status <0>#012MAC Status <40080043>#012PHY Status <796d>#012PHY 1000BASE-T Status <0>#012PHY Extended Status <3000>#012PCI Status <10>
Jul 23 10:42:31 Orion kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:#012 TDH <0>#012 TDT <7>#012 next_to_use <7>#012 next_to_clean <0>#012buffer_info[next_to_clean]:#012 time_stamp <10d1f7ea4>#012 next_to_watch <0>#012 jiffies <10d1fb4c0>#012 next_to_watch.status <0>#012MAC Status <40080043>#012PHY Status <796d>#012PHY 1000BASE-T Status <0>#012PHY Extended Status <3000>#012PCI Status <10>


I was running "kmod-e1000e-3.3.4-1.clearos7.njh.x86_64.rpm" when i got the errors, i have since removed the driver and am running the stock driver. driver was from this thread:
https://www.clearos.com/clearfoundation/social/community/anybody-having-problems-with-intel-e1000-on-clearos-6-7#reply-133391

any thoughts other then ymmv?

after removing driver:
# lspci -k | grep Eth -A 3
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04)
Subsystem: Gigabyte Technology Co., Ltd Device e000
Kernel driver in use: e1000e
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
--
07:00.0 Ethernet controller: Qualcomm Atheros AR8161 Gigabit Ethernet (rev 10)
Subsystem: Gigabyte Technology Co., Ltd Device e000
Kernel driver in use: alx
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)

# modinfo e1000e
filename: /lib/modules/3.10.0-327.22.2.v7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version: 3.2.5-k
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <linux.nics@intel.com>
rhelversion: 7.2
Saturday, July 23 2016, 04:07 PM
Share this post:
Responses (14)
  • Accepted Answer

    Sunday, July 12 2020, 09:28 PM - #Permalink
    Resolved
    0 votes
    I am not sure about it being a Redhat bug. I think it goes right up to kernel.org. Arguably it is an Intel issue. We've never had deep down kernel or driver competence to that extent. Largely we integrate third party apps with a lot of PHP code so are working at a high level. There is some deeper down stuff (things like clearsync and a little bit of kernel patching we did for IMQ), but most work is high level.

    I know this issue is going to make ClearOS look bad for some users but it is pretty much out of our control. People who have the hardware may be able to test the latest driver which I think is available on Sourceforge. If it works, then it is a question of interesting the people at ElRepo to put it into a kmod driver. They are very good at that, but really it needs someone with the affected hardware trying the latest drivers.

    I think I've seen somewhere about Redhat deprecating the e1000e drivers at some point. Or it may have been the e1000 drivers. If that is the case, ElRepo may pick them up.
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, July 12 2020, 07:43 PM - #Permalink
    Resolved
    0 votes
    Allow me to offer a bit of what I hope is constructive criticism..

    I understand the "we just give you the upstream..." bit. I do. and I'm personally more than willing to try a command here or there in experimentation to resolve an issue of mine... but you really should try to something to provide a user-friendly (and preferably defaulted/automatic) fix. There's conversation in these threads about recompiling kernel modules and such -- I acknowledge, that really should be redhat's job, but as they're clearly not doing that, I dare say you're more qualified than some of your end users - likely some of your paying end users.

    I know this is a redhat bug, but ClearOS -- not so much the community version, but home and especially professional) is positioned as a product that may very well be installed and used by someone less willing to "fix" things this way. When I replied to the old thread today, it wasn't so much looking for the command (disabling gso, gro, and tso is already mentioned in that thread, and I had already done that when I replied) as letting you know that a problem still exists that some of your customers may want addressed... and given how popular the Intel NICs are, I guarantee it is affecting people who aren't participating in this thread. They installed a product (yours) and when they finally realize their network connection keeps dropping, it's your product they're going to blame - not redhat.

    I only caught the error after it had already existed for a couple months. *I* thought my system was fine and I was being booted off dalnet because dalnet has just been broken since 1998. My client suspected his ISP. I've also been having this weird rdp problem ("due to a problem with encryption..." session dropped) that I realized today coincided with these resets. That's on the lan, and I'm literally six feet from the server, so I never suspected network issues, and have instead been troubleshooting rdp version compatibility for the last month.

    One of my other clients *replaced* two of his PCs (NOT my recommendation!) in attempt to resolve SMB issues before finally calling me and saying "I think my server is broke."

    I geek. I try things, I break things, I fix them when I can, I come ask you when I can't. *I* know it's redhat's problem. You do have a rather large user base that may not care whose problem it is, they just want it fixed. If you can't/won't provide an updated driver that fixes this, perhaps you might include the "tso off" bit in something that is (or can be) automatically applied? That's why I brought this back up today - to let you know it's there, in hopes that you will do whatever you can to ensure that it goes away, not just for me (I did the ethtool thing eight hours ago!) but for those of your users who haven't found this thread, or haven't looked, and just think "ClearOS is broken."
    The reply is currently minimized Show
  • Accepted Answer

    Friday, June 05 2020, 04:04 PM - #Permalink
    Resolved
    0 votes
    Nick Howitt wrote:

    We don't recompile the kernel, but just use the upstream one so we have no way of redistributing an updated driver. Have you tried the solution in this thread?


    Nick, thanks for the quick response.

    I had a couple hangs this morning, so the driver update was not a fix. I'm trying # ethtool -K enp0s25 tso off now. Fingers crossed!

    Dave
    The reply is currently minimized Show
  • Accepted Answer

    Friday, June 05 2020, 07:44 AM - #Permalink
    Resolved
    0 votes
    We don't recompile the kernel, but just use the upstream one so we have no way of redistributing an updated driver. Have you tried the solution in this thread? The other possible solution is to ask the guys at ElRepo if they would consider providing it again. They are very helpful. They stopped a while back when they noticed that the version in the kernel had a load of backported fixes in it and so the version they were supplying was actually older than the kernel version. Things may have moved on since the,
    Like
    1
    The reply is currently minimized Show
  • Accepted Answer

    Friday, June 05 2020, 04:25 AM - #Permalink
    Resolved
    0 votes
    I'm having the same Hardware Unit Hang issue with an Intel 82566DM Gigabit controller and the e1000e module on the external interface for my ClearOS box. I lose my connection for 8-10 seconds at least once a day. It's embarrassing on videoconferences in the COVID-19 pandemic. I found the issue in /var/log/messages:

    Jun  1 10:17:07 gateway kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:#012  TDH                  <e4>#012  TDT                  <5>#012  next_to_use          <5>#012  next_to_clean        <e2>#012buffer_info[next_to_clean]:#012  time_stamp           <1bc373670>#012  next_to_watch        <e4>#012  jiffies              <1bc373d04>#012  next_to_watch.status <0>#012MAC Status             <802a3>#012PHY Status             <792d>#012PHY 1000BASE-T Status  <3800>#012PHY Extended Status    <3000>#012PCI Status             <10>
    Jun 1 10:17:09 gateway kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:#012 TDH <e4>#012 TDT <5>#012 next_to_use <5>#012 next_to_clean <e2>#012buffer_info[next_to_clean]:#012 time_stamp <1bc373670>#012 next_to_watch <e4>#012 jiffies <1bc3744d4>#012 next_to_watch.status <0>#012MAC Status <802a3>#012PHY Status <792d>#012PHY 1000BASE-T Status <3800>#012PHY Extended Status <3000>#012PCI Status <10>
    Jun 1 10:17:11 gateway kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:#012 TDH <e4>#012 TDT <5>#012 next_to_use <5>#012 next_to_clean <e2>#012buffer_info[next_to_clean]:#012 time_stamp <1bc373670>#012 next_to_watch <e4>#012 jiffies <1bc374ca4>#012 next_to_watch.status <0>#012MAC Status <802a3>#012PHY Status <792d>#012PHY 1000BASE-T Status <3800>#012PHY Extended Status <3000>#012PCI Status <10>
    Jun 1 10:17:13 gateway kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:#012 TDH <e4>#012 TDT <5>#012 next_to_use <5>#012 next_to_clean <e2>#012buffer_info[next_to_clean]:#012 time_stamp <1bc373670>#012 next_to_watch <e4>#012 jiffies <1bc375474>#012 next_to_watch.status <0>#012MAC Status <802a3>#012PHY Status <792d>#012PHY 1000BASE-T Status <3800>#012PHY Extended Status <3000>#012PCI Status <10>
    Jun 1 10:17:13 gateway kernel: e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
    Jun 1 10:17:16 gateway kernel: e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None


    So I've tried compiling the 3.8.4 Intel driver. I'm running that, but not sure if it's fixed yet. There are some serious downsides to compiling the latest source: tainted kernel (maybe I could learn how to sign the driver) and the bit about a new kernel update coming in and restoring the old driver. And I had to use Intel's suggestion of "dracut --force" to write a new initiramfs so it will survive reboots.

    Is there a reason ClearOS can't include a driver update for e1000e? Version 3.8.4 is several revisions newer than what I had on my stock system.
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, September 03 2016, 05:01 AM - #Permalink
    Resolved
    0 votes
    Eric Anderson wrote:

    i've disabled tso to see if that fixes it

    >ethtool -K eno1 tso off


    uptime 25 days and no network hangs.
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, August 09 2016, 02:48 AM - #Permalink
    Resolved
    0 votes
    i've disabled tso to see if that fixes it

    >ethtool -K eno1 tso off
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, August 07 2016, 04:52 PM - #Permalink
    Resolved
    0 votes
    If history repeats, and I'm seeing no reason why not, then your stuck with 3.10 until ClearOS 8. RedHat do not do major updates between major versions of EL (and therefore Centos and ClearOS) but they continually back-port fixes into the current major release and these appear as minor releases. This does include updates to NIC drivers but I'm not sure it includes feature additions.
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, August 06 2016, 04:08 PM - #Permalink
    Resolved
    0 votes
    The appears to be a problem with old kernels (like 3.10), and will eventually goes away with the newer kernels. What is the kernel upgrade timeline?

    Old kernels don't mean they are stable...
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, July 24 2016, 08:33 PM - #Permalink
    Resolved
    0 votes
    Short of googling, I have no ideas. Sorry.
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, July 24 2016, 08:21 PM - #Permalink
    Resolved
    0 votes
    Even with C1E disabled I still get resets:

    [ 2065.535376] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
    [ 2067.321432] e1000e: eno1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
    [ 2065.527956] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
    TDH <13>
    TDT <48>
    next_to_use <48>
    next_to_clean <b>
    buffer_info[next_to_clean]:
    time_stamp <1001acab0>
    next_to_watch <13>
    jiffies <1001af918>
    next_to_watch.status <0>
    MAC Status <40080043>
    PHY Status <796d>
    PHY 1000BASE-T Status <0>
    PHY Extended Status <3000>
    PCI Status <10>
    [ 2065.535237] ------------[ cut here ]------------
    [ 2065.535245] WARNING: at net/sched/sch_generic.c:303 dev_watchdog+0x270/0x280()
    [ 2065.535246] NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
    [ 2065.535247] Modules linked in: xt_nat xt_multiport ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6t_rt ip6t_REJECT bluetooth ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 rfkill nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_conntrack nf_nat_tftp nf_conntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_pptp nf_conntrack_proto_gre arc4 ppp_mppe ppp_generic slhc nf_conntrack_irc nf_conntrack_ftp ipt_REJECT xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx intel_powerclamp coretemp intel_rapl kvm crc32_pclmul snd_hda_codec_hdmi
    [ 2065.535278] ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt iTCO_vendor_support cryptd snd_hda_codec_realtek mxm_wmi snd_hda_codec_generic snd_hda_intel snd_hda_codec pcspkr i2c_i801 snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core snd_timer snd soundcore sg mei_me mei ie31200_edac edac_core shpchp wmi tpm_infineon ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel serio_raw firewire_ohci alx e1000e mdio firewire_core i915 ptp crc_itu_t pps_core i2c_algo_bit drm_kms_helper ahci libahci drm libata i2c_core video dm_mirror dm_region_hash dm_log dm_mod
    [ 2065.535311] CPU: 1 PID: 1009 Comm: webconfig Not tainted 3.10.0-327.22.2.v7.x86_64 #1
    [ 2065.535312] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-UD5H, BIOS F15q 01/07/2013
    [ 2065.535313] ffff88041f283d88 0000000084031d48 ffff88041f283d40 ffffffff81636b24
    [ 2065.535316] ffff88041f283d78 ffffffff8107b200 0000000000000000 ffff880404540000
    [ 2065.535318] ffff8800369b0080 0000000000000001 0000000000000001 ffff88041f283de0
    [ 2065.535320] Call Trace:
    [ 2065.535322] <IRQ> [<ffffffff81636b24>] dump_stack+0x19/0x1b
    [ 2065.535330] [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
    [ 2065.535332] [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
    [ 2065.535337] [<ffffffff8154da20>] dev_watchdog+0x270/0x280
    [ 2065.535339] [<ffffffff8154d7b0>] ? dev_graft_qdisc+0x80/0x80
    [ 2065.535343] [<ffffffff8108b0a6>] call_timer_fn+0x36/0x110
    [ 2065.535345] [<ffffffff8154d7b0>] ? dev_graft_qdisc+0x80/0x80
    [ 2065.535348] [<ffffffff8108dd97>] run_timer_softirq+0x237/0x340
    [ 2065.535350] [<ffffffff81084b0f>] __do_softirq+0xef/0x280
    [ 2065.535353] [<ffffffff81648bdc>] call_softirq+0x1c/0x30
    [ 2065.535357] [<ffffffff81016fc5>] do_softirq+0x65/0xa0
    [ 2065.535359] [<ffffffff81084ea5>] irq_exit+0x115/0x120
    [ 2065.535361] [<ffffffff81649855>] smp_apic_timer_interrupt+0x45/0x60
    [ 2065.535364] [<ffffffff81647f1d>] apic_timer_interrupt+0x6d/0x80
    [ 2065.535365] <EOI> [<ffffffff816473eb>] ? sysret_audit+0x17/0x21
    [ 2065.535369] ---[ end trace 740b238075633ff4 ]---
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, July 24 2016, 04:00 AM - #Permalink
    Resolved
    0 votes
    i'm getting errors with the stock one to...

    #dmesg
    ...
    [41100.685702] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
    [41102.466934] e1000e: eno1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
    [42970.846911] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
    [42972.632058] e1000e: eno1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
    ...


    ~]# modinfo e1000e
    filename: /lib/modules/3.10.0-327.22.2.v7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
    version: 3.2.5-k
    license: GPL
    description: Intel(R) PRO/1000 Network Driver
    author: Intel Corporation, <linux.nics@intel.com>
    rhelversion: 7.2
    srcversion: 7097C005F85B5C9D374D3FB
    ...

    # lspci -k | grep Eth -A 3
    00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04)
    Subsystem: Gigabyte Technology Co., Ltd Device e000
    Kernel driver in use: e1000e
    00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
    --
    07:00.0 Ethernet controller: Qualcomm Atheros AR8161 Gigabit Ethernet (rev 10)
    Subsystem: Gigabyte Technology Co., Ltd Device e000
    Kernel driver in use: alx
    08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, July 23 2016, 05:16 PM - #Permalink
    Resolved
    0 votes
    Googling around it seems to be quite a common issue. There is a potential solution in this thread. You can try it or google further.
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, July 23 2016, 05:10 PM - #Permalink
    Resolved
    0 votes
    Are you saying that both stock and kmod drivers give the error? If so I'm not sure what we can do.

    You can try compiling Intel's source, but you'll have to do that. You'll need to install the development environment but skip installing the editor. From distant memory the installation instructions are in the Intel source and they are very simple. The disadvantage of this approach as that you will need to recompile the driver every time the kernel gets updated.
    The reply is currently minimized Show
Your Reply