Forums

Khairun
Khairun
Offline
Resolved
1 votes
Hello,

I'm one of your clearsOS enterprise happy customer and I have ClearOS 6.6 running on our systems for the last 7 years and now I have found the symptom that one of my disk drive is about to die.
Now it is always try to check the filesystems after the server got reboot and most of the time would take hours.
Today I had this again and stack on 30% and I have to hard reboot.
Any ideas how to solves this, as I don't know how to get in to the console to do several mdam tasks.
Friday, February 19 2016, 02:08 AM
Share this post:
Responses (12)
  • Accepted Answer

    Sunday, February 21 2016, 01:54 AM - #Permalink
    Resolved
    0 votes
    Run the smart test on /dev/sdb - what condition is that disk in?
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Saturday, February 20 2016, 11:34 AM - #Permalink
    Resolved
    0 votes
    Hi,

    Here's the result of the smar test disk. I belive is a lot of problem.
    Any idea what to do next?

    Thanks for any help on this.

    root@sysresccd /root % smartctl --all /dev/sda  
    smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.18.25-std471-amd64] (local build)
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Family: Seagate Barracuda 7200.12
    Device Model: ST3250318AS
    Serial Number: 5VY2V6RG
    LU WWN Device Id: 5 000c50 021722b96
    Firmware Version: CC38
    User Capacity: 250,059,350,016 bytes [250 GB]
    Sector Size: 512 bytes logical/physical
    Rotation Rate: 7200 rpm
    Device is: In smartctl database [for details use: -P show]
    ATA Version is: ATA8-ACS T13/1699-D revision 4
    SATA Version is: SATA 2.6, 3.0 Gb/s
    Local Time is: Sat Feb 20 19:30:11 2016 UTC

    ==> WARNING: A firmware update for this drive may be available,
    see the following Seagate web pages:
    http://knowledge.seagate.com/articles/en_US/FAQ/207931en
    http://knowledge.seagate.com/articles/en_US/FAQ/213891en

    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status: (0x82) Offline data collection activity
    was completed without error.
    Auto Offline Data Collection: Enabled.
    Self-test execution status: ( 121) The previous self-test completed having
    the read element of the test failed.
    Total time to complete Offline
    data collection: ( 600) seconds.
    Offline data collection
    capabilities: (0x7b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    Conveyance Self-test supported.
    Selective Self-test supported.
    SMART capabilities: (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
    Error logging capability: (0x01) Error logging supported.
    General Purpose Logging supported.
    Short self-test routine
    recommended polling time: ( 1) minutes.
    Extended self-test routine
    recommended polling time: ( 44) minutes.
    Conveyance self-test routine
    recommended polling time: ( 2) minutes.
    SCT capabilities: (0x103f) SCT Status supported.
    SCT Error Recovery Control supported.
    SCT Feature Control supported.
    SCT Data Table supported.


    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Extended offline Completed: read failure 90% 51026 486461351

    SMART Selective self-test log data structure revision number 1
    SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
    Selective self-test flags (0x0):
    After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Saturday, February 20 2016, 10:38 AM - #Permalink
    Resolved
    0 votes
    Hi Tony,

    I'm working on your suggestions on below, I will posted any update for this in next 40 minutes.

    Best regards, Khairun

    root@sysresccd /root % smartctl /dev/sda -t long
    smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.18.25-std471-amd64] (local build)
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
    Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
    Testing has begun.
    Please wait 44 minutes for test to complete.
    Test will complete after Sat Feb 20 19:16:50 2016

    Use smartctl -X to abort test.


    [quote]Tony Ellis wrote:
    2. Run the SMART long tests on both drives. Start with /dev/sda as it looks like partition sda3 is a problem. If /dev/sda is clean, try adding it back into the array with mdadm. If there are any problem I would use ddrescue from a SystemRescue like the one in my previous link to copy to a new drive of the same (preferably) or larger size. If there are bad sectors on /dev/sda, ddrescue will try very hard to read with multiple re-tries. If unsuccessful will write zeros in that position on the new drive, then continue, whereas dd would give up... Then replace the old sda disk with the newly created one./quote]
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Saturday, February 20 2016, 10:22 AM - #Permalink
    Resolved
    0 votes
    Hi All,

    Just an update, I have done ddrescue the disk in to the new clean drive. I did not see any error but I'm not sure how can I verify if the data is intact.
    the new drive is indefied as
    Disk /dev/loop0: 337.6 MiB, 353955840 bytes, 691320 sectors
    or sdc.

    I will run the smart test on the drives see if they are clean first. Will keep you posted on this.

    Best regards, Khairun


    root@sysresccd /root % fdisk -l
    Disk /dev/loop0: 337.6 MiB, 353955840 bytes, 691320 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes


    Disk /dev/sda: 232.9 GiB, 250059350016 bytes, 488397168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: dos
    Disk identifier: 0x000de450

    Device Boot Start End Sectors Size Id Type
    /dev/sda1 * 63 208844 208782 102M fd Linux raid autodetect
    /dev/sda2 208845 479990069 479781225 228.8G fd Linux raid autodetect
    /dev/sda3 479990070 488375999 8385930 4G fd Linux raid autodetect


    Disk /dev/sdb: 232.9 GiB, 250059350016 bytes, 488397168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: dos
    Disk identifier: 0x00025dc4

    Device Boot Start End Sectors Size Id Type
    /dev/sdb1 * 63 208844 208782 102M fd Linux raid autodetect
    /dev/sdb2 208845 479990069 479781225 228.8G fd Linux raid autodetect
    /dev/sdb3 479990070 488375999 8385930 4G fd Linux raid autodetect


    Disk /dev/sdc: 298.1 GiB, 320072933376 bytes, 625142448 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: dos
    Disk identifier: 0x000de450

    Device Boot Start End Sectors Size Id Type
    /dev/sdc1 * 63 208844 208782 102M fd Linux raid autodetect
    /dev/sdc2 208845 479990069 479781225 228.8G fd Linux raid autodetect
    /dev/sdc3 479990070 488375999 8385930 4G fd Linux raid autodetect


    Disk /dev/md127: 4 GiB, 4293525504 bytes, 8385792 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes


    Disk /dev/md126: 101.9 MiB, 106823680 bytes, 208640 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes


    Disk /dev/md125: 228.8 GiB, 245647867904 bytes, 479780992 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes


    Disk /dev/md1: 101.9 MiB, 106823680 bytes, 208640 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Friday, February 19 2016, 01:09 PM - #Permalink
    Resolved
    0 votes
    Hi,

    Hi Nick, thank you for your suggestion, as precaution step, I have start the ddrescue methode based on Tony suggestion and try to copy all my current configuration and perhaps the if Im lucky the cyrus imap data.
    It would be a long hours I guess.

    Cross my fingers.
    The reply is currently minimized Show
  • Accepted Answer

    Friday, February 19 2016, 12:36 PM - #Permalink
    Resolved
    0 votes
    I can't add much but I know bad superblocks on ext3 can sometimes be recovered as ext3 keeps backups around the partition. Google "bad superblock ext3" and you'll find lots of references. It is probably a similar procedure with ext2 and ext4 but I can't guarantee it.
    The reply is currently minimized Show
  • Accepted Answer

    Friday, February 19 2016, 10:55 AM - #Permalink
    Resolved
    1 votes
    This would be my approach... Do some research on the web and evaluate what you find with my suggestions. You could well find better... You will also need to learn how to use the tools if you are not familiar with them, but don't practice on this broken system.

    1. Since it looks like you can access all three partitions I would use "dd", or better "ddrescue" (see below) to copy them elsewhere. If one is swap then that should be skipped. This means you now have all your recoverable data safe together with a copy of your configuration files and everything else hopefully.

    2. Run the SMART long tests on both drives. Start with /dev/sda as it looks like partition sda3 is a problem. If /dev/sda is clean, try adding it back into the array with mdadm. If there are any problem I would use ddrescue from a SystemRescue like the one in my previous link to copy to a new drive of the same (preferably) or larger size. If there are bad sectors on /dev/sda, ddrescue will try very hard to read with multiple re-tries. If unsuccessful will write zeros in that position on the new drive, then continue, whereas dd would give up... Then replace the old sda disk with the newly created one.

    I am not sure the ClearOS rescue mode has all the tools that you need, I don't use it for this purpose, but a special disk designed for this type of rescue...

    and if anybody else has better ideas or dis-agree with the above, then respond. I work out what to do in these situations as I go along, as it is impossible to predict a series of absolute steps. it's a matter of gathering facts, make the next move, gather some more etc.
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Friday, February 19 2016, 10:16 AM - #Permalink
    Resolved
    0 votes
    Hi,

    I have managed to boot using clearOS community disk and access the rescue mode for the system.
    I have mounted the system /mnt/sysimage and has gather several more info on my situation.
    I think I can see all the configuration just fine on this mode except for they map cyrus which is very important for men my cases.
    Is there some other things that I can do after this?
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Friday, February 19 2016, 09:37 AM - #Permalink
    Resolved
    0 votes
    Hi Tony, glad to hear your feedback on this, long time I have not visit this forum. A lot of things to catch and learn again.
    I will try your suggestions and will return to you with feedback.
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Friday, February 19 2016, 09:22 AM - #Permalink
    Resolved
    1 votes
    Hi,

    Strange things, when I did
    /cat /proc/mdstat
    there are no drive to shows?
    The reply is currently minimized Show
  • Accepted Answer

    Friday, February 19 2016, 08:55 AM - #Permalink
    Resolved
    1 votes
    Sounds like the file-system is corrupted. Assuming md0 is a Software Raid 1 Version 0.90, and not using LVM, with something in that condition, I would be using a system rescue linux bootable CD/USB stick such as https://www.system-rescue-cd.org/ to look at each drive separately. A raid 1 version 0.90 disk is compatible with a normal linux single disk. I would be looking for my data on each disk in turn to copy elsewhere and start again from scratch, depending of course on what I discovered. An alternative would be to install the two disks in another linux machine with two spare disk connectors. At the first sign of trouble you should have taken a full backup of the system, if you didn't have one, and start replacing the faulty disk(s). Unfortunately if the system is corrupted on one disk, then raid 1 being a mirror the other is likely the same. Raid is NOT a backup and shouldn't be used as an excuse for not doing so. If you were using LVM, then I cannot help you. Last time I used that was with IBM's OS2.
    Like
    1
    The reply is currently minimized Show
  • Accepted Answer

    Khairun
    Khairun
    Offline
    Friday, February 19 2016, 08:03 AM - #Permalink
    Resolved
    0 votes
    Hi,

    I managed to boot up now but now I'm having different problem.
    It said md0: raid array is not clean so its trying to autodecting Raid array.
    And when it finished its trying to resume from /dev/md0 but it failed to read superblock with EXT3-fs and having error while trying to mount /dev/root as ext3 : invalid argument.
    Error with set uproot, no stab.sys and finally Kernel Panic.

    Please help! :(
    The reply is currently minimized Show
Your Reply