Forums

Leon
Leon
Offline
Resolved
0 votes
Hi All

I have configured my server with 8x 2TB drives as Raid 6 as per the video posted here - Title as per this post - Title
At the time i asked the question, creating a raid using - /dev/sdx, will the raid not break if i add or remove non-raid drives as Linux does not preserve the /dev/sdx numbers, "seen this movie before..."
Now it is broken... i replaced one of my non-raid drives to copy data to the Raid, and now it boots to recovery mode.
I have no idea where to start looking, any help will be appreciated.

The other thing i noted is that my 2nd NIC is also "dead"

Z
Thursday, September 14 2017, 08:30 AM
Share this post:
Responses (34)
  • Accepted Answer

    Thursday, September 21 2017, 04:27 PM - #Permalink
    Resolved
    0 votes
    Just thought about doing some research on SATA cards before going to bed (it's currently 2.20 am :(
    Came across this... https://www.jethrocarr.com/2013/11/24/adventures-in-io-hell/ - bit different to your problems but some of the comments at the bottom are telling...

    Seems like the Marvell SATA chipsets/drivers might not be the best if this is anything to go by... Have a Silicon Image 3114 (and still use in a backup Server that's only powered up once a week for backup updates) so your PCI card is probably OK (provided the manufacturer was careful laying out the PC board traces to minimise cross-talk - mine is a different brand to yours) Just make sure the SATA cable is good quality with a nice tight fit. Mine has no notches for the clips... However, based on the URL above there are suspicions about PCIe Marvell based cards - perhaps change yours for a decent one - with no Marvell chipset!. Never used Marvell myself so no first hand experience there.

    My SI 3114 card has 1 drive attached, the motherboard has 4 SATA ports with 4 drives. The 5 drives are combined in a software raid 5 array. The OS resides on 2x IDE drives that are mirrored in Raid 1. All my drives support TLER (ERC) set to 7 seconds timeout.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, September 21 2017, 02:21 PM - #Permalink
    Resolved
    0 votes
    Hi Leon, Once a UUID is assigned to each array during raid creation it doesn't matter what the /dev/sdx order is if you specify by UUID in mdadm.conf... madadm scans all drives looking for the UUIDs and uses that to ascertain which drive is which
    Did you create the script to change the disk time-out and place it, for example, /etc/rc.d/rc.local so it runs when booting? Those drives of yours are not suitable for use in Raid 5/6 without doing that.

    No idea how you are setting the drives up - never watch a "How-To" on YouTube or anything else similar - can read far faster than any narrator can speak and thus lean a lot more in the same period of time... and you can always refer to any written part instantly. Lot better than having to replay something to make sure you heard and understood a certain passage correctly... Initially several years ago downloaded and used the Redhat Administration Manuals and went from there... Studied the complete set, beginning to end, while going to//from work on the train.

    If you continue to have problems - then it might be time to look at the hardware... would be inclined to ditch the two budget PCI/PCIe controllers and get a decent 8-port one with current Linux support, eg a modern LSI, assuming one of your 8x or 16x PCIe slots is vacant... are you using good quality SATA cables with clips? The old original ones with no clips are notorious for creating intermittent connections as are cheap controllers that don't have the little notch for the clip to latch onto. Wouldn't be surprised if your two add-on controllers fall into that category. No clips means you are relying on friction - you might get away with it - but the number of drives you have provides more opportunity for vibrations and connector movement...
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Thursday, September 21 2017, 11:33 AM - #Permalink
    Resolved
    0 votes
    Hi Tony

    Thank you for all the help, but i am at the point where, so my data is gone... deal with it...
    I am prepared to start from scratch, about 4TB will be lost, i think i can get most of it back over time from other sources, it might just take some time.

    I still have the issue where on every reboot, the raid - /var/sd[abcefgijkl] is not the same as the next.
    This is what seems to be the issue - we got side tracked trying to recover the data after the reboot and i loaded data on the drive.

    Any idea how i resolve my original issue?

    I shall start from scratch, as i did from here ClearOS Setting up Storage Volumes with Linux RAID
    Once it is done i shall create /etc/mdadm.conf -
    mdadm --detail --scan >> /etc/mdadm.conf

    Add the UUID to fstab, put some data on it, verify and reboot
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, September 20 2017, 11:38 PM - #Permalink
    Resolved
    0 votes
    Create is dangerous in that you need to specify the disks, in the create command, in the same order that they had become in the raid when it broke. Since you did a grow they may not be in strict alphabetical order any more - do not write anything to the drive - mount it read-only until you have verified the data in large files is OK. Since you have 10 drives the number of possible combinations is enormous, See the Section "File system check" https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID - you really should have done this with overlays - I pointed to this procedure before. For all you know the correct order may be /dev/sdc /dev/sdf /dev/sdd.... etc. On the other hand you may have been very lucky... but what happened to the necessary "--assume-clean"?

    Comment out the entry in fstab - (assume you put it back) and do not add it back until you are sure the array will always assemble on a boot. in the mean time do a manual mount if and when the array is assembled, then reboot. Can you stop /dev/md127 and /dev/md0 and will it now assemble with a --detail --scan ?

    see amongst many others
    https://serverfault.com/questions/538904/mdadm-raid5-recover-double-disk-failure-with-a-twist-drive-order
    https://serverfault.com/questions/347606/recover-raid-5-data-after-created-new-array-instead-of-re-using
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Wednesday, September 20 2017, 08:00 PM - #Permalink
    Resolved
    0 votes
    Hi Tony

    I created a mdadm.conf file, this is the content
    ARRAY /dev/md/0 metadata=1.2 name=Zodiac.lan:0 UUID=785c89de:1d355c46:9b116951:21d6c4b3

    And it booted into emergency mode again?

    Any idea?
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Wednesday, September 20 2017, 02:46 PM - #Permalink
    Resolved
    0 votes
    Hi Tony

    I did
    mdadm --create /dev/md0 --level=6 --chunk=64 --raid-devices=10 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdi1 /dev/sdl1 /dev/sdj1 /dev/sdk1

    I spent hours reading and trying to do a reassemble with Zero success, so i took the dive.

    Only thing i need to check is why i only see 453GB of 456GB free and not the whole array, shall do some google hunting..
    Edit: i think it is bacause it is not mounting md0, so it is the same size as the boot disk
    For some reason, blkid does not show me /dev/md0

    I presumed that the 1st UUID of all the disks that is part of the array is the UUID, but mounting that UUID i get /dev/sdc1 is already mounted
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, September 19 2017, 10:40 AM - #Permalink
    Resolved
    0 votes
    Fingers crossed - :)

    What speed is it rebuilding?

    What did you do to get the array started?
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Tuesday, September 19 2017, 10:29 AM - #Permalink
    Resolved
    0 votes
    Hi Tony

    Yes, not ideal, but i had old P4 3GHz with 4Gig Ram and 2x SATA4000 SI 3114 SATA I 1.5 Gb/s as my previous server.
    It worked fine for what i need it to do.

    Server is busy rebuilding..... should be done in 38 Hours, so let's see.
    I either have data back or nothing....
    The reply is currently minimized Show
  • Accepted Answer

    Monday, September 18 2017, 11:45 PM - #Permalink
    Resolved
    0 votes
    Thanks Leon - minor quibble the STAT4000 is a PCI card using the Silicon Image 3114 controller - it is NOT PCIe
    see http://www.sunix.com.tw/product/sata4000.html
    The SATA2600 is PCIe http://www.sunix.com/product/sata2600.html - so you have 1x PCI and 1x PCIe

    So we have quite a mixture here :)
    Drives are SATA III 6Gb/s (Compatible with SATA I and SATA II)
    Intel Motherboard Controllers SATA II 3Gb/s
    SATA2600 Marvell 91XX SATAIII 6Gb/s (transfer will be limited as has only 1x PCIe connector)
    SATA4000 SI 3114 SATA I 1.5 Gb/s (limited even more since motherboard slot is 33 MHz PCI)

    Should work - but the PCI card is creating a bottle-neck :(
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Monday, September 18 2017, 07:43 PM - #Permalink
    Resolved
    0 votes
    Hi Tony

    The one card is a SUNIX SATA2600 - 2x SATA and the other is a SUNIX SATA 4000 4x SATA
    The reply is currently minimized Show
  • Accepted Answer

    Monday, September 18 2017, 01:13 AM - #Permalink
    Resolved
    0 votes
    Leon - a question about the hardware
    [code]
    The board had 6x SATA port and i have 2 PCIe cards to give me an additional 6 SATA ports
    [/quote]
    Make and Model of the "2 PCIe cards" please...
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, September 17 2017, 09:52 AM - #Permalink
    Resolved
    0 votes
    Thanks Leon... OK - those drives are not suitable for use in a Raid 5 or 6. See the Section in"Timeout Mismatch" at https://raid.wiki.kernel.org/index.php/Timeout_Mismatch - You really need to do get that script working first before anything else - you cannot afford a drive to be kicked out at this stage...

    Then use smartctl to check that every raid drive does not have a Current Pending Sector count warning or other serious error - this is important if you end up using only 8 of your ten drives to recover the array - you don't want a drive kicked for any reason. Run the long test on every drive and check the output. You can run the test on all drives concurrently. Lots of help on this on the web eg https://www.linuxtechi.com/smartctl-monitoring-analysis-tool-hard-drive/
    An example showing Current Pending Sector:- https://community.wd.com/t/help-current-pending-sector-count-warning/3436/3

    Then...
    See https://raid.wiki.kernel.org/index.php/Assemble_Run
    Are the event counts for the 'good' drives the same or very very close?
    Believe you have a Raid 6 - so you should be able to recover if 8 of the 10 drives are OK - then add the other 2 latert.

    Otherwise, it might mean working through https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID - ypu might also want to contact the experts on the raid mailing list...

    Do we assume you created the raid and then used grow without making a viable backup first? If so, that is highly dangerous. You don't use raid as a backup - it is to guard against one kind of hardware failure - a drive failure. There's lots of failure modes that raid doesn't guard against such as file corruption (software problem, power drop etc), human error (deleting files by mistake), viruses and other malware, etc. With a backup the quickest way would be to create the raid again and restore from the backup, having a procedure in place to ensure your backups are complete and useable...
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Sunday, September 17 2017, 07:58 AM - #Permalink
    Resolved
    0 votes
    Hi Tony

    Let's start with the disk
    [root@Zodiac ~]# smartctl -l scterc /dev/sdc
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-514.26.2.v7.x86_64] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

    SCT Commands not supported


    Here are the rest of the information - I used /dev/md0 when i created the array

    [root@Zodiac ~]# cat /proc/mdstat
    Personalities :
    md127 : inactive sdg1[3](S) sde1[1](S) sdf1[2](S) sdj1[5](S) sdl1[7](S) sdh1[4](S) sdc1[0](S) sdk1[6](S)
    15627059200 blocks super 1.2

    [root@Zodiac ~]# mdadm -S /dev/md127
    mdadm: stopped /dev/md127

    [root@Zodiac ~]# mdadm -S /dev/md0
    mdadm: error opening /dev/md0: No such file or directory

    [root@Zodiac ~]# mdadm --detail --scan
    INACTIVE-ARRAY /dev/md127 metadata=1.2 name=localhost.localdomain:0 UUID=243e5d11:e1049ad5:a4a2ce43:304fdb4f

    [root@Zodiac ~]# mdadm -vv --assemble --force /dev/md0 /dev/sd[cdefghijkl]1
    mdadm: looking for devices for /dev/md0
    mdadm: /dev/sdc1 is busy - skipping
    mdadm: no recogniseable superblock on /dev/sdd1
    mdadm: /dev/sdd1 has no superblock - assembly aborted
    [root@Zodiac ~]#


    I also attached the output of - mdadm --examine /dev/sd*

    [root@Zodiac ~]# mdadm --examine /dev/sd*
    /dev/sda:
    MBR Magic : aa55
    Partition[0] : 2097152 sectors at 2048 (type 83)
    Partition[1] : 974673920 sectors at 2099200 (type 8e)
    mdadm: No md superblock detected on /dev/sda1.
    mdadm: No md superblock detected on /dev/sda2.
    mdadm: No md superblock detected on /dev/sdb.
    /dev/sdc:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdc1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : 9e0c7861:16ceac28:516359d9:59cd656f

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : cb80fcde - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdd:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    mdadm: No md superblock detected on /dev/sdd1.
    /dev/sde:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sde1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : 3cfbd19f:052f19e4:ef9c0132:b3537526

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : d4ff0c64 - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdf:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdf1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : eebd70e4:2ff03aa7:3e22b382:cbdc2f1a

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : 212b9ea9 - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdg:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdg1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : 2ec94af6:9a7ad26a:e76988a9:11d2b8b0

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : b3fb7144 - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdh:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdh1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : 00162b82:1c3488bf:f1299cf6:bc353dae

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : df299b4e - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 4
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdi:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    mdadm: No md superblock detected on /dev/sdi1.
    /dev/sdj:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdj1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : 87366ae7:d3fcd71e:f4d5b50d:3a493d1f

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : 2bd2440d - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 5
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdk:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdk1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : b39843dc:4bb2d0de:c0753679:a52bed1d

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : 4ad4ddea - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 6
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdl:
    MBR Magic : aa55
    Partition[0] : 3907029167 sectors at 1 (type ee)
    /dev/sdl1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x45
    Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
    Name : localhost.localdomain:0
    Creation Time : Mon Sep 11 21:40:47 2017
    Raid Level : raid6
    Raid Devices : 10

    Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
    Data Offset : 262144 sectors
    New Offset : 258048 sectors
    Super Offset : 8 sectors
    State : clean
    Device UUID : 3053a5ba:77ac33c9:0ae49712:24186eaf

    Internal Bitmap : 8 sectors from superblock
    Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
    Delta Devices : 2 (8->10)

    Update Time : Fri Sep 15 21:22:14 2017
    Bad Block Log : 512 entries available at offset 72 sectors
    Checksum : 3e7bed5d - correct
    Events : 30173

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 7
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    [root@Zodiac ~]#
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, September 17 2017, 01:42 AM - #Permalink
    Resolved
    0 votes
    As for the raid - did you stop /dev/md127 and /dev/md1 (assuming /dev/md1 is your raid device) before doing the "mdadm --detail -scan"

    If not, then try again stopping both arrays first...

    # cat /proc/mdstat # show us output
    # mdadm -S /dev/md127 # show us output
    # mdadm -S /dev/md1 # show us output (assuming /dev/md1 is your array)
    # mdadm --detail --scan # show us output

    if the ...scan fails, after using both 'stop raid commands', try

    # mdadm -vv --assemble --force /dev/md1 /dev/sd[abcd...]1

    that's two 'v's not a 'w' - show us output - where "abcd..." are all the ten correct drive letters for your raid array and assuming you are using partition 1 for raid which appears to be the case from your output... thanks
    Please do ***NOT*** use the 'create' command yet - that is dangerous and ***last** resort only

    Here's the output from a simulated failure

    cat /proc/mdstat
    Personalities :
    md127 : inactive sdd[2](S) sdc[4](S)
    3906767024 blocks super 1.2

    mdadm -S /dev/127
    mdadm: stopped /dev/md127

    mdadm -S /dev/md1
    mdadm: error opening /dev/md1: No such file or directory

    mdadm --detail -scan
    mdadm: /dev/md1 has been started with 2 drives.
    Personalities : [raid0] [raid1]
    md1 : active raid1 sdc1[0] sdd1[1]
    1953382464 blocks super 1.2 [2/2] [UU]
    bitmap: 0/15 pages [0KB], 65536KB chunk

    Much good information at https://raid.wiki.kernel.org/index.php/Linux_Raid

    Tony... http://www.sraellis.tk/
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, September 17 2017, 12:29 AM - #Permalink
    Resolved
    0 votes
    OK - let's deal with the disks first - and am concerned... This is from Seagate documentation - does give TLER (ERC) specification...

    Barracuda XT drives—The performance leader in the family, with maximum capacity, cache and SATA performance for the ultimate in desktop computing

    Application Desktop RAID

    Cannot find a strict definition - but "desktop raid" often means Raid0 and Raid1 ***ONLY*** - why? because often they don't support TLER (ERC) - and that's a big drawback when used in Raid5 and Raid6 - search the web for all the gory details - but basically on an error occurring the drive should timeout first before the software timeout - this will cause the raid to initiate error recovery. If the software times out first (which is what happens with 'desktop' drives) - the raid thinks the disk is 'broken' and kicks it out of the array. Basically with raid you want the disk to timeout fast as this prevents 'hangs' for the users and raid can recover the data using data from the other drives to re-construct, then re-write the correct data to the 'failing' one. With 'desktop' environments there is only one copy of the data and is only that one drive - so the drive will try desperately, for ages if necessary, to recover the data - the user will experience a 'hang' while this takes place...

    To test for TLER (ERC) see below - please give the results from your drives. If they do not support TLER (ERC) - then we need to change the timeouts within the Linux software disk tables...

    [root@danda ~]# smartctl -l scterc /dev/sdc
    smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-696.v6.x86_64] (local build)
    Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

    SCT Error Recovery Control:
    Read: 70 (7.0 seconds)
    Write: 70 (7.0 seconds)

    [root@karien ~]# smartctl -l scterc /dev/sdc
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-514.26.2.v7.x86_64] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

    SCT Error Recovery Control command not supported
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Saturday, September 16 2017, 08:50 PM - #Permalink
    Resolved
    0 votes
    So i get the following:
    [@Zodiac ~]# cat /proc/mdstat
    Personalities :
    md127 : inactive sdg1[7](S) sdf1[6](S) sdd1[4](S) sdc1[3](S) sdb1[2](S) sda1[1](S)
    11720294400 blocks super 1.2

    unused devices: <none>

    and
    [@Zodiac ~]# mdadm --detail -scan
    INACTIVE-ARRAY /dev/md127 metadata=1.2 name=localhost.localdomain:0 UUID=243e5d11:e1049ad5:a4a2ce43:304fdb4f


    [root@Zodiac ~]# lsblk
    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    sda 8:0 0 1.8T 0 disk
    └─sda1 8:1 0 1.8T 0 part
    sdb 8:16 0 1.8T 0 disk
    └─sdb1 8:17 0 1.8T 0 part
    sdc 8:32 0 1.8T 0 disk
    └─sdc1 8:33 0 1.8T 0 part
    sdd 8:48 0 1.8T 0 disk
    └─sdd1 8:49 0 1.8T 0 part
    sde 8:64 0 1.8T 0 disk
    └─sde1 8:65 0 1.8T 0 part
    sdf 8:80 0 1.8T 0 disk
    └─sdf1 8:81 0 1.8T 0 part
    sdg 8:96 0 1.8T 0 disk
    └─sdg1 8:97 0 1.8T 0 part
    sdh 8:112 0 465.8G 0 disk
    ├─sdh1 8:113 0 1G 0 part /boot
    └─sdh2 8:114 0 464.8G 0 part
    ├─clearos-root 253:0 0 456.9G 0 lvm /
    └─clearos-swap 253:1 0 7.9G 0 lvm [SWAP]
    sdi 8:128 0 465.8G 0 disk /var/flexshare/shares/torrents
    sdj 8:144 0 1.8T 0 disk
    └─sdj1 8:145 0 1.8T 0 part
    sdk 8:160 0 1.8T 0 disk
    └─sdk1 8:161 0 1.8T 0 part
    sdl 8:176 0 1.8T 0 disk
    └─sdl1 8:177 0 1.8T 0 part


    As you can see, the OS drive is now sdh and i do not see any disk showing up as raid here.

    Any way to rebuild the raid??
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Saturday, September 16 2017, 07:57 PM - #Permalink
    Resolved
    0 votes
    Hi Tony / Nick

    w.r.t the Network, the 2nd NIC is only "missing" when the system goes into emergency mode, i am presuming that the OS does not get that far to load the driver???
    Here is the info on the Network ports:
    00:19.0 Ethernet controller [0200]: Intel Corporation 82578DM Gigabit Network Co nnection [8086:10ef] (rev 05)
    Subsystem: Intel Corporation Device [8086:34ec]
    Kernel driver in use: e1000e
    Kernel modules: e1000e
    --
    02:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Con nection [8086:10d3]
    Subsystem: Intel Corporation Device [8086:34ec]
    Kernel driver in use: e1000e
    Kernel modules: e1000e


    My system is a Intel S3420GP with standard Xeon 3450 and 16GB of RAM
    The board had 6x SATA port and i have 2 PCIe cards to give me an additional 6 SATA ports.
    No hardware Raid is enabled.

    The disks are all from a Qnap NAS that the motherboard failed:
    8x seagate barracuda XT 2TB and 2 more of the same that i purchase to make the 10x

    Power supply is a 850W Gigabyte PSU



    Here is my fstab file - Thank you for pointing out that i need to "#" the Raid map in fstab to boot normal.
    #
    # /etc/fstab
    # Created by anaconda on Thu Sep 14 21:04:41 2017
    #
    # Accessible filesystems, by reference, are maintained under '/dev/disk'
    # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
    #
    /dev/mapper/clearos-root / xfs defaults 0 0
    UUID=0c365059-ad79-431a-8486-2a83ac85eb67 /boot xfs defaults 0 0
    #UUID="3e796bd4-d628-43ad-a4ba-f6c6c7f93710" /store/data1 ext4 defaults 0 0
    /dev/mapper/clearos-swap swap swap defaults 0 0

    # Mount Drives by Label
    LABEL=Torrents /mnt/Torrents ext3 defaults,noatime 0 0
    /mnt/Torrents /var/flexshare/shares/torrents none defaults,bind 0 0
    #/store/data1/Movies /var/flexshare/shares/movies none defaults,bind 0 0
    #/store/data1/Series /var/flexshare/shares/series none defaults,bind 0 0


    I still get this feeling that on reboot, that the drives - /dev/sd[a-l] - do not stay in sequence.
    I have seen that the boot drive is /dev/sdc and after i add or remove a non-raid drive it will be /dev/sdg.
    To me this means that the raid gets broken, they are not /dev/sdc to /dev/sdk, and that is why it is failing and going into "Emergency" mode?
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, September 16 2017, 08:46 AM - #Permalink
    Resolved
    0 votes
    Tony Ellis wrote:

    The other thing i noted is that my 2nd NIC is also "dead"

    Was this resolved? - otherwise could be looking at all sorts of other problems - BIOS, missing interrupts etc
    Missing drivers?
    lspci -knn | grep Eth -A 3
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, September 16 2017, 07:56 AM - #Permalink
    Resolved
    0 votes
    Good point Nick - since the OP was putting the system in a rack one would hope this is a decent server that can support 10 drives properly :o

    Examining the output of the command below should confirm if all drives are accessable...
    # fdisk -l /dev/sd[a-j]

    Leon - tell us what the system hardware is like please... motherboard, raid/disk controllers and PS especially.

    One thing worrying thing is the comment...

    The other thing i noted is that my 2nd NIC is also "dead"

    Was this resolved? - otherwise could be looking at all sorts of other problems - BIOS, missing interrupts etc
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, September 16 2017, 07:31 AM - #Permalink
    Resolved
    0 votes
    Another sideways thought. With 10 RAID drives, is your PSU strong enough to reliably spin them all up together?
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, September 16 2017, 01:14 AM - #Permalink
    Resolved
    1 votes
    One more thing... hope these are proper NAS or Raid certified drives - not consumer - otherwise you will be continually in a world of pain with that many drives...

    Can you post the make and model of the disk drives you are using please...
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, September 16 2017, 01:11 AM - #Permalink
    Resolved
    0 votes
    No that is not normal :o Sounds as if the raid array is not being assembled on boot.

    I assume here that the raid is only for data and no system directories reside on it...
    if so - the following should get you going... (assuming an assemble failure)

    When in emergency mode type your root password to get in.
    Edit /etc/fstab and put a comment ( # ) in front of the line for the raid filesystem
    Reboot - the system should come up without the filesystem on the raid

    # cat /proc/mdstat
    (please save a copy of this to review later..
    e.g. cat /proc/mdstat > /root/mdstat_on_boot.txt)

    suspect it will have an entry re. md127
    if so
    # mdadm -S /dev/md127

    for good measure in case it is stuck
    # mdadm -S /dev/mdx
    where "x" is your array number

    ensure you have a valid mdadm.conf - here is the complete file for one of mine as a example

    MAILADDR root
    AUTO +imsm +1.x -all
    ARRAY /dev/md10 metadata=1.2 name=karien.sraellis.com:10 UUID=41cd67d5:98593c51:a02a59db:aaa91e8e

    then

    # mdadm --assemble --scan
    you should get a message that your array has started

    # cat /proc/mdadm
    looks good?

    remove the comment you added to the /etc/fstab file and mount
    # mount -a

    all being well the filesystem should mount - then requirement to determine why raid array is not being assembled on a reboot...

    There are of course other failure methods - just picked the most likely...
    How about posting your fstab and mdadm.conf files together with the output from "cat /proc/mdadm"
    Also "mdadm --detail /dev/mdx" where "x" is your array number when you get it going...

    Also before rebooting - comment out the raid filesystem line in /etc/fstab until you have resolved the problem. This may save the "Emergency mode"after a reboot. You can tell from /etc/mdstat after reboots if you have solved the problem, then you can leave the line active in fstab. Another solution is to leave it commented out until the problem is fixed, and in the meantime start the filesystem from the command line after you have manually got the raid going correctly..
    e.g. mount -o noatime /dev/md1 /data - substitute your own values...


    Edit - fixed the odd typo - **verify** my commands before using...
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Friday, September 15 2017, 07:45 PM - #Permalink
    Resolved
    0 votes
    So it is official.... I HATE RAID.....

    Did a shutdown to move the server back to it's rack.....Emergency mode, again... What The Fudge???
    Is this normal?
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Friday, September 15 2017, 06:13 PM - #Permalink
    Resolved
    0 votes
    So i could not get ClearOS to boot normally, did not matter what i did, it went to the recovery console.
    So i did a clean install again with only the OS drive connected, setup ClearOS to the point where the network is configured and you have to re-boot.

    I then connected the 11 other drives and re-booted.
    During going thru the update and installing the apps i checked with putty and the Raid was re-building itself - 1200 odd min later - Raid6 fully on line.

    I am now growing the Raid6 to 10x 2TB drives, that is going to take some time....
    Once that is done i shall see how to release the 5%, about 1TB of space and i should have about 14TB of Raid6 storage.

    As this is my 1st dive into Software Raid, i have come to the conclusion that Raid and non-Raid drives on the same system is NOT for me.
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Thursday, September 14 2017, 05:25 PM - #Permalink
    Resolved
    0 votes
    Hi Tony

    Thank you for the pointer, let me do the google thing and i shall post here the moment i have some news
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, September 14 2017, 05:23 PM - #Permalink
    Resolved
    0 votes
    You create it... search the web and look at the 'man' page for mdadm...

    EDIT: Sorry for being curt - but it is 3.25 am in in Aus and I must get to bed :-(
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Thursday, September 14 2017, 05:20 PM - #Permalink
    Resolved
    0 votes
    Hi Tony

    Where can i find mdadm.conf?
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, September 14 2017, 05:20 PM - #Permalink
    Resolved
    0 votes
    We crossed

    Is their a way to use the raid drives UUID for mdadm, as the UUID never changes

    See my append just before yours :-)
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Thursday, September 14 2017, 05:16 PM - #Permalink
    Resolved
    0 votes
    Hi Nick

    What i mean is that if i run cat /proc/mdstat, it show md127: active raird6 sda[1] sdb[2] sdd[5] ..... you get the picture
    Now when i remove or add drives, sda or sdd might not be part of the raid anymore as they have now been assigned to the other non-raid drives.

    Is their a way to use the raid drives UUID for mdadm, as the UUID never changes.

    In fstab i have a UUID, but this is not for the individual drive, but for the UUID that mdadm gave me when i made the raid array
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, September 14 2017, 05:09 PM - #Permalink
    Resolved
    0 votes
    By specifying UUID in mdadm.conf would have thought /dev/sdx changing would not be a problem - i.e. a mdadm.conf similar to below - the disks are scanned for the UUID to start the array, regardless of the /dev/sdx values... First 4 arrays are raid 1, the last raid 5, example from one of my machines


    ARRAY /dev/md0 metadata=0.90 UUID=f0511672:1c4c78a6:0d13d201:d8f86b49
    ARRAY /dev/md2 metadata=0.90 UUID=9f28af19:dd735a97:f0090154:079827c8
    ARRAY /dev/md3 metadata=0.90 UUID=5682de5a:98ebd29a:df9ea576:ef6ee481
    ARRAY /dev/md4 metadata=0.90 UUID=de20142e:4f6ad390:17d8605d:b2c36998
    ARRAY /dev/md10 metadata=1.2 name=danda.sraellis.com:10 UUID=5006cf00:449f8311:2bbdec12:b86c4ef1

    an alternative file that also avoids /dev/sdx entries...

    ARRAY /dev/md0 level=raid1 num-devices=2 UUID=f0511672:1c4c78a6:0d13d201:d8f86b49
    ARRAY /dev/md2 level=raid1 num-devices=2 UUID=9f28af19:dd735a97:f0090154:079827c8
    ARRAY /dev/md3 level=raid1 num-devices=2 UUID=5682de5a:98ebd29a:df9ea576:ef6ee481
    ARRAY /dev/md4 level=raid1 num-devices=2 UUID=de20142e:4f6ad390:17d8605d:b2c36998
    ARRAY /dev/md10 level=raid5 num-devices=3 UUID=5006cf00:449f8311:2bbdec12:b86c4ef1

    # mdadm --detail --scan
    shows what could be in your mdadm.conf
    # mdadm --detail /dev/mdx
    will provide the UUID for a single array and the current /dev/sdx assignments amongst other information...

    /dev/md10:
    Version : 1.2
    Creation Time : Tue May 5 11:30:08 2015
    Raid Level : raid5
    Array Size : 1845515264 (1760.02 GiB 1889.81 GB)
    Used Dev Size : 922757632 (880.01 GiB 944.90 GB)
    Raid Devices : 3
    Total Devices : 3
    Persistence : Superblock is persistent

    Intent Bitmap : Internal

    Update Time : Fri Sep 15 02:41:29 2017
    State : clean
    Active Devices : 3
    Working Devices : 3
    Failed Devices : 0
    Spare Devices : 0

    Layout : left-symmetric
    Chunk Size : 512K

    Name : danda.sraellis.com:10 (local to host danda.sraellis.com)
    UUID : 5006cf00:449f8311:2bbdec12:b86c4ef1
    Events : 15590

    Number Major Minor RaidDevice State
    3 8 54 0 active sync /dev/sdd6
    1 8 38 1 active sync /dev/sdc6
    2 8 24 2 active sync /dev/sdb8

    Can you post your fstab and mdadm.conf files...

    EDIT: - just saw this update

    I suspect that not everyone has a raid system with non-raid drives.

    Not everyone, but at least there are some systems here with a mixture of software raid and non-raid disks :-)
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, September 14 2017, 05:07 PM - #Permalink
    Resolved
    0 votes
    I'm not sure what you're aiming for. The safest way to mount drives in /etc/fstab is by UUID and it is easy to change the entries round. If you do that the best thing to do is reboot for it to take effect. "mount -a " may pick up the changes, but I don't know. If it does not and you don't want to reboot you'll need to unmount the drives first before giving the "mount -a" command.

    I'm afraid I don't understand this bit:
    Is their a way to exit the raid info to mount by UUID instead of /dev/sdx?
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Thursday, September 14 2017, 04:35 PM - #Permalink
    Resolved
    0 votes
    Hi Nick

    I checked, and yes, i do not have any non-raid drives mapped in fstab

    What i did see is exactly what i expected the issue is:
    When i change - remove or add non-raid drives - ClearOS does NOT assign the same /dev/sdx id to the drives.
    I suspect that this is the issue.

    I did query this before but was told that you mount the Raid by it's UUID.
    I suspect that not everyone has a raid system with non-raid drives.

    Is their a way to edit the raid info to mount by UUID instead of /dev/sdx?
    That way it does not matter what Linux do.
    The reply is currently minimized Show
  • Accepted Answer

    Leon
    Leon
    Offline
    Thursday, September 14 2017, 09:44 AM - #Permalink
    Resolved
    0 votes
    Hi Nick

    Yes i did that before i shut the server down.
    I shall check again when i get home from work later today.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, September 14 2017, 08:53 AM - #Permalink
    Resolved
    0 votes
    Can you look at /etc/fstab and comment out the entry which pointed to the drive you removed. In recovery mode you can use the nano editor, so something like:
    nano /etc/fstab
    or
    cd /etc
    nano fstab
    If you try to exit nano, it will prompt you to save the file.
    The reply is currently minimized Show
Your Reply