Hello,
I'm one of your clearsOS enterprise happy customer and I have ClearOS 6.6 running on our systems for the last 7 years and now I have found the symptom that one of my disk drive is about to die.
Now it is always try to check the filesystems after the server got reboot and most of the time would take hours.
Today I had this again and stack on 30% and I have to hard reboot.
Any ideas how to solves this, as I don't know how to get in to the console to do several mdam tasks.
I'm one of your clearsOS enterprise happy customer and I have ClearOS 6.6 running on our systems for the last 7 years and now I have found the symptom that one of my disk drive is about to die.
Now it is always try to check the filesystems after the server got reboot and most of the time would take hours.
Today I had this again and stack on 30% and I have to hard reboot.
Any ideas how to solves this, as I don't know how to get in to the console to do several mdam tasks.
Share this post:
Responses (12)
-
Accepted Answer
-
Accepted Answer
Hi,
Here's the result of the smar test disk. I belive is a lot of problem.
Any idea what to do next?
Thanks for any help on this.
root@sysresccd /root % smartctl --all /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.18.25-std471-amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST3250318AS
Serial Number: 5VY2V6RG
LU WWN Device Id: 5 000c50 021722b96
Firmware Version: CC38
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Sat Feb 20 19:30:11 2016 UTC
==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 44) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 51026 486461351
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
-
Accepted Answer
Hi Tony,
I'm working on your suggestions on below, I will posted any update for this in next 40 minutes.
Best regards, Khairun
root@sysresccd /root % smartctl /dev/sda -t long
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.18.25-std471-amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 44 minutes for test to complete.
Test will complete after Sat Feb 20 19:16:50 2016
Use smartctl -X to abort test.
[quote]Tony Ellis wrote:
2. Run the SMART long tests on both drives. Start with /dev/sda as it looks like partition sda3 is a problem. If /dev/sda is clean, try adding it back into the array with mdadm. If there are any problem I would use ddrescue from a SystemRescue like the one in my previous link to copy to a new drive of the same (preferably) or larger size. If there are bad sectors on /dev/sda, ddrescue will try very hard to read with multiple re-tries. If unsuccessful will write zeros in that position on the new drive, then continue, whereas dd would give up... Then replace the old sda disk with the newly created one./quote] -
Accepted Answer
Hi All,
Just an update, I have done ddrescue the disk in to the new clean drive. I did not see any error but I'm not sure how can I verify if the data is intact.
the new drive is indefied as
Disk /dev/loop0: 337.6 MiB, 353955840 bytes, 691320 sectors
or sdc.
I will run the smart test on the drives see if they are clean first. Will keep you posted on this.
Best regards, Khairun
root@sysresccd /root % fdisk -l
Disk /dev/loop0: 337.6 MiB, 353955840 bytes, 691320 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sda: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000de450
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 63 208844 208782 102M fd Linux raid autodetect
/dev/sda2 208845 479990069 479781225 228.8G fd Linux raid autodetect
/dev/sda3 479990070 488375999 8385930 4G fd Linux raid autodetect
Disk /dev/sdb: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00025dc4
Device Boot Start End Sectors Size Id Type
/dev/sdb1 * 63 208844 208782 102M fd Linux raid autodetect
/dev/sdb2 208845 479990069 479781225 228.8G fd Linux raid autodetect
/dev/sdb3 479990070 488375999 8385930 4G fd Linux raid autodetect
Disk /dev/sdc: 298.1 GiB, 320072933376 bytes, 625142448 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000de450
Device Boot Start End Sectors Size Id Type
/dev/sdc1 * 63 208844 208782 102M fd Linux raid autodetect
/dev/sdc2 208845 479990069 479781225 228.8G fd Linux raid autodetect
/dev/sdc3 479990070 488375999 8385930 4G fd Linux raid autodetect
Disk /dev/md127: 4 GiB, 4293525504 bytes, 8385792 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/md126: 101.9 MiB, 106823680 bytes, 208640 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/md125: 228.8 GiB, 245647867904 bytes, 479780992 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/md1: 101.9 MiB, 106823680 bytes, 208640 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
This would be my approach... Do some research on the web and evaluate what you find with my suggestions. You could well find better... You will also need to learn how to use the tools if you are not familiar with them, but don't practice on this broken system.
1. Since it looks like you can access all three partitions I would use "dd", or better "ddrescue" (see below) to copy them elsewhere. If one is swap then that should be skipped. This means you now have all your recoverable data safe together with a copy of your configuration files and everything else hopefully.
2. Run the SMART long tests on both drives. Start with /dev/sda as it looks like partition sda3 is a problem. If /dev/sda is clean, try adding it back into the array with mdadm. If there are any problem I would use ddrescue from a SystemRescue like the one in my previous link to copy to a new drive of the same (preferably) or larger size. If there are bad sectors on /dev/sda, ddrescue will try very hard to read with multiple re-tries. If unsuccessful will write zeros in that position on the new drive, then continue, whereas dd would give up... Then replace the old sda disk with the newly created one.
I am not sure the ClearOS rescue mode has all the tools that you need, I don't use it for this purpose, but a special disk designed for this type of rescue...
and if anybody else has better ideas or dis-agree with the above, then respond. I work out what to do in these situations as I go along, as it is impossible to predict a series of absolute steps. it's a matter of gathering facts, make the next move, gather some more etc. -
Accepted Answer
Hi,
I have managed to boot using clearOS community disk and access the rescue mode for the system.
I have mounted the system /mnt/sysimage and has gather several more info on my situation.
I think I can see all the configuration just fine on this mode except for they map cyrus which is very important for men my cases.
Is there some other things that I can do after this? -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Sounds like the file-system is corrupted. Assuming md0 is a Software Raid 1 Version 0.90, and not using LVM, with something in that condition, I would be using a system rescue linux bootable CD/USB stick such as https://www.system-rescue-cd.org/ to look at each drive separately. A raid 1 version 0.90 disk is compatible with a normal linux single disk. I would be looking for my data on each disk in turn to copy elsewhere and start again from scratch, depending of course on what I discovered. An alternative would be to install the two disks in another linux machine with two spare disk connectors. At the first sign of trouble you should have taken a full backup of the system, if you didn't have one, and start replacing the faulty disk(s). Unfortunately if the system is corrupted on one disk, then raid 1 being a mirror the other is likely the same. Raid is NOT a backup and shouldn't be used as an excuse for not doing so. If you were using LVM, then I cannot help you. Last time I used that was with IBM's OS2. -
Accepted Answer
Hi,
I managed to boot up now but now I'm having different problem.
It said md0: raid array is not clean so its trying to autodecting Raid array.
And when it finished its trying to resume from /dev/md0 but it failed to read superblock with EXT3-fs and having error while trying to mount /dev/root as ext3 : invalid argument.
Error with set uproot, no stab.sys and finally Kernel Panic.
Please help!
Please login to post a reply
You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.
Register Here »