Forums

Resolved
0 votes
Where to begin? So many problems.

I maintain several systems running ClearOS. Until recently, it has served me well.

One system took to arpwatch reporting "<mac> sent bad hardware format" for every single vlan-tagged packet, into /var/log/messages -- to the point where any significant network traffic would slow to a crawl as it had to wait for the disk to catch up with the log file. I could not find any solution other than to disable arpwatch.

Now my home system.

I set out this morning to set up a PBX for a client. It isn't working, and I KNOW the problem is related to Vitelity -- every time I set up a new IP endpoint with them, it doesn't initially work, until support does something. Well the person I got in support today is playing dumb, so I thought I'd prove the point. I have a working PBX here at my house, behind a clearos box. I figured I'd just move my ISP ethernet cable to a different port (I have a 4-port card with all four ports set as external interfaces in multiwan) so that I'd get a different IP, add THAT ip to vitelity's portal, and prove that my previously-working pbx doesn't work when I change its IP...

... NO LUCK. "Firewall is in panic mode."

Why?

I used to use multiwan all the time, that's why I have that card in there, and I could run just fine from either ISP. What has changed? What did you update that broke my working multiwan install?

Why can't I leave something completely alone for a few months and count on it to work, like it always has, when I need it?

I don't have the patience to troubleshoot this, nor the time to go back and forth with a support forum. I have other jobs to do, and I need to be able to count on my existing infrastructure to work. Once I have put x number of hours into building a system that works, it is not logical to put y more hours into making it do the same thing /again/.


I tried rebooting the box after switching wan ports. My console is now scrolling "[ OK ] Started Session c<number> of user root." endlessly. We're up to number=6067 so far. What is this? I actually can't even use the console for troubleshooting because of this.


Luckily, putting the cable back in the port I was using before got me back online, but what if it didn't?


Unless someone replies with, "oh, I had that problem, here's what you do," the only resolution I can imagine here is to literally reinstall the whole thing. What a colossal waste of time.
Monday, May 13 2019, 12:19 AM
Share this post:
Responses (3)
  • Accepted Answer

    Monday, May 13 2019, 02:01 PM - #Permalink
    Resolved
    0 votes
    Thanks for explaining the issue.

    There was a problem at the time of the 7.6 Community updates where how the Webconfig worked changed. This generated a massive amounts of writes to the events database and is what was causing systems problems. I had noticed my old server disk getting very noisy but I had not attributed this to the updates (I updated mine in advance to do some testing) and it was just about keeping up. My play VM was not but again I did not know what to attribute it to. It was only when, post 7.6 updates, there was a thread from one community member and confirmed by another that I managed to work it out. A patch was released to stop the problem, but there was still way too much logging to messages so another patch was issued for that. It really helps if people get these problems that they post to the forums. I may not be able to answer everything but I am not the only one here.

    There are still residual logging problems, but there always have been. If I sit in the IP settings screen the audit log writes 8MB of data in 15 mins which is not good news, especially for anyone with an SSD. This is a related issue and will be looked at.

    Other nonsensical messages should also be looked at which is why I'd like to sort the arpwatch messages. Patching these bits of the programs is not too hard as it does not involve PHP. It just takes a bit of time.
    The reply is currently minimized Show
  • Accepted Answer

    Monday, May 13 2019, 01:20 PM - #Permalink
    Resolved
    0 votes
    The arpwatch problem only affects one system, and was resolved by disabling arpwatch. Another post on here mentioned that doing so would only affect the "Network Visualizer" app, and what do you know, that was the only system I had tried that app on. I uninstalled said app and disabled all of the arpwatch-whatever services.

    The symptom it resolved was that, any time I tried to push any meaningful amount of traffic from any computer that was on a vlan, the entire network would slow to a crawl. As near as I can tell (guess), it's because the act of writing all of those messages to the log file was more i/o than the mechanical disk could keep up with.

    My theory as to why it was a problem may or may not be incorrect, but disabling arpwatch made the symptom go away.



    The problem I had yesterday turned out to be somewhat my fault (sorta), hence why I marked my other post resolved. Against some semi-official suggestions, I run gnome on my systems -- a habit I started /after/ this system was first used in a multiwan environment. As it turns out, gnome installs NetworkManager, and NM coaxes dhclient into putting its .lease and .router files in a different location. libfirewall.lua attempts to get the default gateway for a DHCP connection out of /var/lib/dhclient/<interface_name>.routers -- and if that file doesn't exist (or is over a year old and represents a different ISP entirely...) then things just don't work.

    When I officially installed gnome, I didn't notice any problem. I was using multiwan at that point, but everything kept working, because the .lease and .routers files already existed in the old location, and no update was necessary. I have dropped to just one provider, but it has stayed on the same interface and apparently Spectrum's gateway IP hasn't changed. When I moved the cable to a different port, it tried to use an interface that didn't have a .routers file, which made things angry.

    I've disabled NetworkManager. Firewall is no longer in panic mode. I suspect there are some problems remaining, as the service called "network" errors out instead, and when I do make multiwan changes it takes practically *forever* to switch. I'm at work right now, and will probably be quite busy for a while, but will be attempting to resolve the remaining bits when I have time.
    The reply is currently minimized Show
  • Accepted Answer

    Monday, May 13 2019, 08:01 AM - #Permalink
    Resolved
    0 votes
    The arpwatch issue is an upstream issue and only happens if you are using a managed switch of some sort. I have never heard of it bring a system down, but the messages can be suppressed in rsyslog. Create a file /etc/rsyslog.d/{anything_you_want}.conf and in it put something like:
    # polkitd messages
    if ($programname == 'arpwatch' and $msg contains 'sent bad hardware format') then stop
    Then restart the rsyslog daemon. You can fine-tune the message filter perhaps adding the 0x5 or what ever it is. Similarly you can stop all the bogon messages, but a better way to do that is to add the "-N" switch to /etc/sysconfig/arpwatch, but restarting arpwatch is harder. I think you have to do it for each LAN interface with a:
    systemctl restart arpwatch@{your_lan_interface}
    I will try and release a patch for the bogon bit sometime and may also do the bad hardware bug as well. The only thing with the bad hardware bug is that, if upstream ever release an update (it exists but Redhat have never incorporated it), the rsyslog filter should be backed out.

    Your panic mode is probably caused by you switching WAN interfaces but leaving what then becomes an invalid MultiWAN rule in place. The last MultiWAN update was in April 2018.

    What is your version of ClearOS ("cat /etc/clearos-release")? If you're using 7.x, what versions of csplugin-events and system-base are you running? You want:
    [root@server ~]# rpm -q csplugin-events system-base
    csplugin-events-1.2-3.v7.x86_64
    system-base-7.6.5-1.v7.x86_64


    If it was a multiwan firewall issue, I'll mention it to the devs. I know they have been looking to do changes there recently but nothing has yet been released. They may be interested to hear of your issue.
    The reply is currently minimized Show
Your Reply