ClearOS server is rebooting from time to time

Offline

ClearOS server is rebooting from time to time

Resolved

0 votes

I'm running a ClearOS server but this server is unstable with ClearOS. This server had Windows as OS a few weeks ago and before that it was also a ClearOS server. For some reason it's rebooting from time to time. I have searched in the logs but till now I found nothing. I'm not completely sure in which log to search so I checked some logs I thought are the right ones. So where to search for clues?

In Uncategorized

Friday, July 06 2018, 02:50 PM

Share this post:

Responses (24)

Accepted Answer
Marcel

Offline
Tuesday, July 17 2018, 06:26 PM - #Permalink
Resolved

0 votes

Nick Howitt wrote:

Hi Marcel,
I'm in the sunny South of France and I'm being blocked from my vpn

Are you on vacation? Enjoy your vacation!

From memory Intel sources have good instructions in them. I used to build them directly from tar.gz with rpmbuild rather than compile and install. I'd then install the compiled rpm. You'll need the normal development bits. Also check out SourceForge for drivers. Later versions are sometimes there and I think they are official. Remember you will need to recompile for each kernels update.

Yeah, I'll found a readme file. It looks easy. Thanks for the heads up and I'll check SourceForge.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, July 17 2018, 06:14 PM - #Permalink
Resolved

0 votes

Hi Marcel,
I'm in the sunny South of France and I'm being blocked from my vpn From memory Intel sources have good instructions in them. I used to build them directly from tar.gz with rpmbuild rather than compile and install. I'd then install the compiled rpm. You'll need the normal development bits. Also check out SourceForge for drivers. Later versions are sometimes there and I think they are official. Remember you will need to recompile for each kernels update.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Tuesday, July 17 2018, 05:22 AM - #Permalink
Resolved

0 votes

I think it's best to compile a new version of the driver from source? I've found the driver on the Intel site.

Intel igb driver

Going to try this in my virtual machine. Look like a good challenge! I did this before but it's been awhile... Any hints?

btw 5 days uptime. So it's for sure the igb driver in combination with the i210 controller.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Saturday, July 14 2018, 08:30 PM - #Permalink
Resolved

0 votes

over 3 day uptime! I think it was a drivers issue with my two Intel® Ethernet Controller I210. I had no more reboots when I started to use the 4 port Ethernet controller. I think it's a i350 but I'm not absolutely sure.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Saturday, July 14 2018, 12:31 PM - #Permalink
Resolved

0 votes

Uptime 2 days and 17 hours. Tested with heavy network traffic and without.

Edit: another test with heavy network traffic in progress.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Thursday, July 12 2018, 07:19 PM - #Permalink
Resolved

0 votes

24 hours up. Tested for 2-3 hours with heavy network traffic.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Thursday, July 12 2018, 02:17 PM - #Permalink
Resolved

0 votes

The server has now 19 hours uptime so no crashes. There was very light network traffic.

Going to test with heavy network traffic if the server crashes with traffic through the 4 ports network card. Only a bit pity the 4 ports network card uses the same ibg driver.., but you never know because the hardware is different.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Wednesday, July 11 2018, 07:19 PM - #Permalink
Resolved

0 votes

I remembered that I had a 4 ports nic lying in my hardware box. So I installed that card to see if it helps to resolve the issue...

Edit: checked temps of the cpu but they are fine 41 degrees.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Wednesday, July 11 2018, 06:51 PM - #Permalink
Resolved

0 votes

Yeah, your right about the ibg driver. I also did a search but didn't found any issue.

I checked /var/log/messages a bit but nothing found yet..

Started the ClearOS server this evening and everything was back to normal. So I downloaded stuff on another server and boom again 2 reboots. There was an hour in between. It looks if the reboots happen with a lot of network traffic. Now I'm curious if the server stays up till the next heavy network traffic.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, July 10 2018, 09:06 PM - #Permalink
Resolved

0 votes

That uses the igb driver (I know as I have one) and I am not aware of any issues with it.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Tuesday, July 10 2018, 09:02 PM - #Permalink
Resolved

0 votes

Did a search on the iPad, The S1200RPL is using the Intel® Ethernet Controller I210.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Tuesday, July 10 2018, 08:39 PM - #Permalink
Resolved

0 votes

I've completely shutdown the server. It became almost unresponsive and stayed almost unresponsive after reboot. No internet and very slow webgui. Also some 404 when go from page to page in the webgui.

I've monitored the cpu temps and they are okay 40-45 degrees. No HDD only one SSD

I have a Intel monterboard the S1200RPLV3 I've installed ClearOS many times on this hardware. A few weeks ago it was running Windows 10 and as far as I know there where no reboots.

So time to go to bed. It's to late already. I have a early morning. Tomorrow evening I'll check as you suggested.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, July 10 2018, 07:46 PM - #Permalink
Resolved

0 votes

Marcel van Leeuwen wrote:

These "started" and "starting" sessions of root looks strange to me....
They are annoyingly normal. if you run the command:
systemd-analyze set-log-level notice
They will go away. It is a one-off command.

You can do a load of message filtering in rsyslogd. Create a file /etc/rsyslog.d/anything.conf and in it put things like:
# polkitd messages if ($programname == 'polkitd' and $msg contains 'egistered Authentication Agent for unix-process:') then stop # arpwatch flip flop messages if ($programname == 'arpwatch' and $msg contains '0.0.0.0') then stop
This will drop other annoying messages. I also use lines to split out dhcp, openvpn, docker and firewall messages into separate files (but remember to add a logrotate function for each one). I also drop apcups and snort messages from the messages log as they appear elsewhere, and suppress some smartd and proftp messages. After you change anything here, restart rsyslog.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, July 10 2018, 07:35 PM - #Permalink
Resolved

0 votes

Out of curiosity, what are your NIC's?
lspci -k | grep Eth -A 3
I know the e1000e driver can hang but I am not aware of it causing crashes. Is there anything in the messages, system or dmesg logs? Also when you reboot what are your temperatures (read them of the BIOS or lm_sensors). In the case of HDD issues try installing smartmontools.

What is your processor? Is by any chance a Ryzen?
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Tuesday, July 10 2018, 07:23 PM - #Permalink
Resolved

0 votes

These "started" and "starting" sessions of root looks strange to me....
The reply is currently minimized Show

Accepted Answer

Marcel

Offline

Tuesday, July 10 2018, 07:21 PM - #Permalink

Resolved

0 votes

...and another crash after 10 minutes uptime. Again during heavy network traffic.

Jul 10 21:00:01 voyager systemd: Started Session 12 of user root.

Jul 10 21:00:01 voyager systemd: Starting Session 12 of user root.

Jul 10 21:01:01 voyager systemd: Started Session 13 of user root.

Jul 10 21:01:01 voyager systemd: Starting Session 13 of user root.

Jul 10 21:01:01 voyager systemd: Started Session 14 of user root.

Jul 10 21:01:01 voyager systemd: Starting Session 14 of user root.

Jul 10 21:02:01 voyager systemd: Started Session 15 of user root.

Jul 10 21:02:01 voyager systemd: Starting Session 15 of user root.

Jul 10 21:03:58 voyager nmbd[1265]: [2018/07/10 21:03:58.182684,  0] ../source3/nmbd/nmbd_namequery.c:109(query_name_response)

Jul 10 21:03:58 voyager nmbd[1265]:  query_name_response: Multiple (2) responses received for a query on subnet 192.168.1.1 for name WORKGROUP<1d>.

Jul 10 21:03:58 voyager nmbd[1265]:  This response was from IP 192.168.1.20, reporting an IP address of 192.168.1.20.

Jul 10 21:04:09 voyager nslcd[1891]: [a2a8d4] <passwd="guest"> (re)loading /etc/nsswitch.conf

Jul 10 21:05:01 voyager systemd: Started Session 17 of user root.

Jul 10 21:05:01 voyager systemd: Starting Session 17 of user root.

Jul 10 21:05:01 voyager systemd: Started Session 19 of user root.

Jul 10 21:05:01 voyager systemd: Starting Session 19 of user root.

Jul 10 21:05:01 voyager systemd: Started Session 16 of user root.

Jul 10 21:05:01 voyager systemd: Starting Session 16 of user root.

Jul 10 21:05:01 voyager systemd: Started Session 18 of user root.

Jul 10 21:05:01 voyager systemd: Starting Session 18 of user root.

Jul 10 21:08:37 voyager systemd: Starting Cleanup of Temporary Directories...

Jul 10 21:08:37 voyager systemd: Started Cleanup of Temporary Directories.

Jul 10 21:09:15 voyager nmbd[1265]: [2018/07/10 21:09:15.508003,  0] ../source3/nmbd/nmbd_namequery.c:109(query_name_response)

Jul 10 21:09:15 voyager nmbd[1265]:  query_name_response: Multiple (2) responses received for a query on subnet 192.168.1.1 for name WORKGROUP<1d>.

Jul 10 21:09:15 voyager nmbd[1265]:  This response was from IP 192.168.1.20, reporting an IP address of 192.168.1.20.

Jul 10 21:10:01 voyager systemd: Started Session 20 of user root.

Jul 10 21:10:01 voyager systemd: Starting Session 20 of user root.

Jul 10 21:10:01 voyager systemd: Started Session 21 of user root.

Jul 10 21:10:01 voyager systemd: Starting Session 21 of user root.

Jul 10 21:10:01 voyager systemd: Started Session 22 of user root.

Jul 10 21:10:01 voyager systemd: Starting Session 22 of user root.

Jul 10 21:10:01 voyager systemd: Started Session 23 of user root.

Jul 10 21:10:01 voyager systemd: Starting Session 23 of user root.

Jul 10 21:10:01 voyager systemd: Started Session 24 of user root.

Jul 10 21:10:01 voyager systemd: Starting Session 24 of user root.

Jul 10 21:10:01 voyager systemd: Started Session 25 of user root.

Jul 10 21:10:01 voyager systemd: Starting Session 25 of user root.

Jul 10 21:15:47 voyager kernel: microcode: microcode updated early to revision 0x22, date = 2017-01-27

Jul 10 21:15:47 voyager kernel: Initializing cgroup subsys cpuset

Jul 10 21:15:47 voyager kernel: Initializing cgroup subsys cpu

Jul 10 21:15:47 voyager kernel: Initializing cgroup subsys cpuacct

Jul 10 21:15:47 voyager kernel: Linux version 3.10.0-693.17.1.v7.x86_64 (mockbuild@build64-1.clearsdn.local) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Sun Feb 4 11:15:12 MST 2018

Jul 10 21:15:47 voyager kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.v7.x86_64 root=/dev/mapper/clearos-root ro rd.lvm.lv=clearos/root rd.lvm.lv=clearos/swap rhgb quiet LANG=en_US.UTF-8

Jul 10 21:15:47 voyager kernel: e820: BIOS-provided physical RAM map:

The reply is currently minimized Show

Accepted Answer

Marcel

Offline

Tuesday, July 10 2018, 07:09 PM - #Permalink

Resolved

0 votes

I have re-positioned my ClearOS server so it's next to my desk. After 2 day the server rebooted. It was during heavy network traffic. It happend on 20:52, and I was next to the server. Here a snippet of my messages log:

Jul 10 20:45:01 voyager systemd: Starting Session 3435 of user root.

Jul 10 20:45:01 voyager systemd: Started Session 3436 of user root.

Jul 10 20:45:01 voyager systemd: Starting Session 3436 of user root.

Jul 10 20:45:01 voyager systemd: Removed slice User Slice of root.

Jul 10 20:45:01 voyager systemd: Stopping User Slice of root.

Jul 10 20:48:31 voyager nmbd[28064]: [2018/07/10 20:48:31.946351,  0] ../source3/nmbd/nmbd_namequery.c:109(query_name_response)

Jul 10 20:48:31 voyager nmbd[28064]:  query_name_response: Multiple (2) responses received for a query on subnet 192.168.1.1 for name WORKGROUP<1d>.

Jul 10 20:48:31 voyager nmbd[28064]:  This response was from IP 192.168.1.20, reporting an IP address of 192.168.1.20.

Jul 10 20:50:01 voyager systemd: Created slice User Slice of root.

Jul 10 20:50:01 voyager systemd: Starting User Slice of root.

Jul 10 20:50:01 voyager systemd: Started Session 3439 of user root.

Jul 10 20:50:01 voyager systemd: Starting Session 3439 of user root.

Jul 10 20:50:01 voyager systemd: Started Session 3438 of user root.

Jul 10 20:50:01 voyager systemd: Starting Session 3438 of user root.

Jul 10 20:50:01 voyager systemd: Started Session 3441 of user root.

Jul 10 20:50:01 voyager systemd: Starting Session 3441 of user root.

Jul 10 20:50:01 voyager systemd: Started Session 3440 of user root.

Jul 10 20:50:01 voyager systemd: Starting Session 3440 of user root.

Jul 10 20:50:01 voyager systemd: Started Session 3442 of user root.

Jul 10 20:50:01 voyager systemd: Starting Session 3442 of user root.

Jul 10 20:50:01 voyager systemd: Started Session 3443 of user root.

Jul 10 20:50:01 voyager systemd: Starting Session 3443 of user root.

Jul 10 20:50:01 voyager systemd: Removed slice User Slice of root.

Jul 10 20:50:01 voyager systemd: Stopping User Slice of root.

Jul 10 20:53:37 voyager kernel: microcode: microcode updated early to revision 0x22, date = 2017-01-27

Jul 10 20:53:37 voyager kernel: Initializing cgroup subsys cpuset

Jul 10 20:53:37 voyager kernel: Initializing cgroup subsys cpu

Jul 10 20:53:37 voyager kernel: Initializing cgroup subsys cpuacct

Jul 10 20:53:37 voyager kernel: Linux version 3.10.0-693.17.1.v7.x86_64 (mockbuild@build64-1.clearsdn.local) (gcc version 4.8.5 20150623 (Red Hat 4$

Jul 10 20:53:37 voyager kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.v7.x86_64 root=/dev/mapper/clearos-root ro rd.lvm.lv=clearos/root$

Jul 10 20:53:37 voyager kernel: e820: BIOS-provided physical RAM map:

Any Idea?

The reply is currently minimized Show

Accepted Answer
Marcel

Offline
Sunday, July 08 2018, 07:22 PM - #Permalink
Resolved

0 votes

Thank you Dave for your input!

How do I recognize the rsyslog messages? Does the line start with "rsyslog"?

Edit: are recognizable by "rsyslogd".
The reply is currently minimized Show
Accepted Answer
Dave Loper

Offline
Sunday, July 08 2018, 02:49 PM - #Permalink
Resolved

0 votes

Knowing the time that the server crashes or reboots is key to solving any issues. I know that there are ClearOS servers that have been running for months and years presently that have not required reboots and that when rebooted are only done so with implicit direction from an admin. So generally, there are not ClearOS-specific bugs that I'm aware of.

However, we do see from time to time issues with hardware interactions and the key to identifying these instances is knowing when the server stopped logging and what it had to say just before it went down. The key to finding the time of when it went down is actually looking to the time it starts. In the /var/log/messages log you will see messages about syslog and rsyslog. There are two main messages that the rsyslog daemon makes in /var/log/messages. It makes HUP messages when it does a log rotate. These HUP messages are unimportant for finding the reboot. But the other message it will make is when the server is started. So searching the rsyslog messages will tell you when the server started. You can then locate the messages immediately preceding these to determine what was going on before it failed.

In cases where you get a kernel panic, there is not much to see. But these are generally log-able by setting up crashdump properly. We had a ticket come into support just weeks ago reporting random shutdowns and uses the rsyslog method to identify what transpired in the shutdown. In this case, the log event indicating that and ACPI shutdown was initiated by a button press revealed that the culprit was a worker who was backing his chair into the power button from time to time.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Sunday, July 08 2018, 01:51 PM - #Permalink
Resolved

0 votes

I did a re-install today because I had issues with reports. You can read about it in this thread:

https://www.clearos.com/clearfoundation/social/community/network-visualizer-stuck-in-loading-mode

Lets see what happens after this re-install. Maybe it's solved, and the server doesn't reboot randomly anymore.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Friday, July 06 2018, 07:21 PM - #Permalink
Resolved

0 votes

I heard that more people experienced crashes/reboots after Meltdown and Spectre fixes.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Friday, July 06 2018, 06:44 PM - #Permalink
Resolved

0 votes

Marcel van Leeuwen wrote:
What's the OS on your Desktop?

WIn10 64-bit. My laptop has just done the same to perhaps it is a Windows UPdate issue. The laptop is on verion 1803 and the desktop on 1709. The laptop has crashed once, the desktop many times.
The reply is currently minimized Show
Accepted Answer
Marcel

Offline
Friday, July 06 2018, 04:55 PM - #Permalink
Resolved

0 votes

Good point Nick! Going to investigate and test.

What's the OS on your Desktop?

One thing crossed my mind and that is the Meltdown and Spectre fixes...
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Friday, July 06 2018, 04:26 PM - #Permalink
Resolved

0 votes

I'd tend to keep an eye on the dmesg, secure and system logs but you'll probably find nothing. You could install lm-sensors and watch you temperatures or try a full memory test using memtest86+. When you've done that you can come and fix my desktop which has taken to spontaneously rebooting - last time with an uncaught exception message.
The reply is currently minimized Show

Your Reply

Please login to post a reply

You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.

Community Forums

ClearOS Portal

ClearVM Platform

ClearVM 2 Platform

Forums