Troubleshooting a Crashing Server
ClearOS is a very stable operating system with some users being able to have uptimes reported in years! However, certain situations can bring ClearOS to its knees. This guide will help you track down the issue and hopefully resolve whatever ails your system.
Experience demonstrates that the primary reason that a ClearOS server will crash is related to hardware. This is a good place to start so consider the following questions:
Have you added any hardware recently?
Did you experience hardware difficulty when you initially installed ClearOS?
Did your system restart recently with a new kernel from an update?
Is your hard disk full or otherwise impaired?
A good place to start is to rule out some easy items.
Hard Drive space
You can look at you hard drive available space with the following command:
Hard Drive Health
The smartmontools can help you find out if your hard drive is itself reporting errors. This works on systems with directly connected hard drives (not on ones that have a RAID card, consult with the RAID card's reporting if that is the case). To get started, install the smartmontools and then check the drives listed in the 'df' from above.
If df lists only LVM drives, use pvs to find out the drives. If pvs or df lists md devices, use 'cat /proc/mdstat' to find out what they are.
yum -y install smartmontools
smartctl -A /dev/sda
Often times with hardware issues, the kernel will complain about the hardware. You can use the /var/log/messages file to discover what these issues are. You can use the log viewer app, tail the log file, or edit the log file to discover these kernel messages.
tail -f /var/log/messages
If a kernel panic has occurred, linux will often display those messages on the video output of the server. Try attaching a monitor to discover any messages related to a kernel panic.
Applications can sometimes also cause problems. Check the /var/log/messages file for any warnings or errors associated with applications.