Hi All
I have configured my server with 8x 2TB drives as Raid 6 as per the video posted here - Title as per this post - Title
At the time i asked the question, creating a raid using - /dev/sdx, will the raid not break if i add or remove non-raid drives as Linux does not preserve the /dev/sdx numbers, "seen this movie before..."
Now it is broken... i replaced one of my non-raid drives to copy data to the Raid, and now it boots to recovery mode.
I have no idea where to start looking, any help will be appreciated.
The other thing i noted is that my 2nd NIC is also "dead"
Z
I have configured my server with 8x 2TB drives as Raid 6 as per the video posted here - Title as per this post - Title
At the time i asked the question, creating a raid using - /dev/sdx, will the raid not break if i add or remove non-raid drives as Linux does not preserve the /dev/sdx numbers, "seen this movie before..."
Now it is broken... i replaced one of my non-raid drives to copy data to the Raid, and now it boots to recovery mode.
I have no idea where to start looking, any help will be appreciated.
The other thing i noted is that my 2nd NIC is also "dead"
Z
Share this post:
Responses (34)
-
Accepted Answer
Just thought about doing some research on SATA cards before going to bed (it's currently 2.20 am
Came across this... https://www.jethrocarr.com/2013/11/24/adventures-in-io-hell/ - bit different to your problems but some of the comments at the bottom are telling...
Seems like the Marvell SATA chipsets/drivers might not be the best if this is anything to go by... Have a Silicon Image 3114 (and still use in a backup Server that's only powered up once a week for backup updates) so your PCI card is probably OK (provided the manufacturer was careful laying out the PC board traces to minimise cross-talk - mine is a different brand to yours) Just make sure the SATA cable is good quality with a nice tight fit. Mine has no notches for the clips... However, based on the URL above there are suspicions about PCIe Marvell based cards - perhaps change yours for a decent one - with no Marvell chipset!. Never used Marvell myself so no first hand experience there.
My SI 3114 card has 1 drive attached, the motherboard has 4 SATA ports with 4 drives. The 5 drives are combined in a software raid 5 array. The OS resides on 2x IDE drives that are mirrored in Raid 1. All my drives support TLER (ERC) set to 7 seconds timeout. -
Accepted Answer
Hi Leon, Once a UUID is assigned to each array during raid creation it doesn't matter what the /dev/sdx order is if you specify by UUID in mdadm.conf... madadm scans all drives looking for the UUIDs and uses that to ascertain which drive is which
Did you create the script to change the disk time-out and place it, for example, /etc/rc.d/rc.local so it runs when booting? Those drives of yours are not suitable for use in Raid 5/6 without doing that.
No idea how you are setting the drives up - never watch a "How-To" on YouTube or anything else similar - can read far faster than any narrator can speak and thus lean a lot more in the same period of time... and you can always refer to any written part instantly. Lot better than having to replay something to make sure you heard and understood a certain passage correctly... Initially several years ago downloaded and used the Redhat Administration Manuals and went from there... Studied the complete set, beginning to end, while going to//from work on the train.
If you continue to have problems - then it might be time to look at the hardware... would be inclined to ditch the two budget PCI/PCIe controllers and get a decent 8-port one with current Linux support, eg a modern LSI, assuming one of your 8x or 16x PCIe slots is vacant... are you using good quality SATA cables with clips? The old original ones with no clips are notorious for creating intermittent connections as are cheap controllers that don't have the little notch for the clip to latch onto. Wouldn't be surprised if your two add-on controllers fall into that category. No clips means you are relying on friction - you might get away with it - but the number of drives you have provides more opportunity for vibrations and connector movement... -
Accepted Answer
Hi Tony
Thank you for all the help, but i am at the point where, so my data is gone... deal with it...
I am prepared to start from scratch, about 4TB will be lost, i think i can get most of it back over time from other sources, it might just take some time.
I still have the issue where on every reboot, the raid - /var/sd[abcefgijkl] is not the same as the next.
This is what seems to be the issue - we got side tracked trying to recover the data after the reboot and i loaded data on the drive.
Any idea how i resolve my original issue?
I shall start from scratch, as i did from here ClearOS Setting up Storage Volumes with Linux RAID
Once it is done i shall create /etc/mdadm.conf -mdadm --detail --scan >> /etc/mdadm.conf
Add the UUID to fstab, put some data on it, verify and reboot -
Accepted Answer
Create is dangerous in that you need to specify the disks, in the create command, in the same order that they had become in the raid when it broke. Since you did a grow they may not be in strict alphabetical order any more - do not write anything to the drive - mount it read-only until you have verified the data in large files is OK. Since you have 10 drives the number of possible combinations is enormous, See the Section "File system check" https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID - you really should have done this with overlays - I pointed to this procedure before. For all you know the correct order may be /dev/sdc /dev/sdf /dev/sdd.... etc. On the other hand you may have been very lucky... but what happened to the necessary "--assume-clean"?
Comment out the entry in fstab - (assume you put it back) and do not add it back until you are sure the array will always assemble on a boot. in the mean time do a manual mount if and when the array is assembled, then reboot. Can you stop /dev/md127 and /dev/md0 and will it now assemble with a --detail --scan ?
see amongst many others
https://serverfault.com/questions/538904/mdadm-raid5-recover-double-disk-failure-with-a-twist-drive-order
https://serverfault.com/questions/347606/recover-raid-5-data-after-created-new-array-instead-of-re-using -
Accepted Answer
-
Accepted Answer
Hi Tony
I didmdadm --create /dev/md0 --level=6 --chunk=64 --raid-devices=10 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdi1 /dev/sdl1 /dev/sdj1 /dev/sdk1
I spent hours reading and trying to do a reassemble with Zero success, so i took the dive.
Only thing i need to check is why i only see 453GB of 456GB free and not the whole array, shall do some google hunting..
Edit: i think it is bacause it is not mounting md0, so it is the same size as the boot disk
For some reason, blkid does not show me /dev/md0
I presumed that the 1st UUID of all the disks that is part of the array is the UUID, but mounting that UUID i get /dev/sdc1 is already mounted -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Thanks Leon - minor quibble the STAT4000 is a PCI card using the Silicon Image 3114 controller - it is NOT PCIe
see http://www.sunix.com.tw/product/sata4000.html
The SATA2600 is PCIe http://www.sunix.com/product/sata2600.html - so you have 1x PCI and 1x PCIe
So we have quite a mixture here
Drives are SATA III 6Gb/s (Compatible with SATA I and SATA II)
Intel Motherboard Controllers SATA II 3Gb/s
SATA2600 Marvell 91XX SATAIII 6Gb/s (transfer will be limited as has only 1x PCIe connector)
SATA4000 SI 3114 SATA I 1.5 Gb/s (limited even more since motherboard slot is 33 MHz PCI)
Should work - but the PCI card is creating a bottle-neck -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Thanks Leon... OK - those drives are not suitable for use in a Raid 5 or 6. See the Section in"Timeout Mismatch" at https://raid.wiki.kernel.org/index.php/Timeout_Mismatch - You really need to do get that script working first before anything else - you cannot afford a drive to be kicked out at this stage...
Then use smartctl to check that every raid drive does not have a Current Pending Sector count warning or other serious error - this is important if you end up using only 8 of your ten drives to recover the array - you don't want a drive kicked for any reason. Run the long test on every drive and check the output. You can run the test on all drives concurrently. Lots of help on this on the web eg https://www.linuxtechi.com/smartctl-monitoring-analysis-tool-hard-drive/
An example showing Current Pending Sector:- https://community.wd.com/t/help-current-pending-sector-count-warning/3436/3
Then...
See https://raid.wiki.kernel.org/index.php/Assemble_Run
Are the event counts for the 'good' drives the same or very very close?
Believe you have a Raid 6 - so you should be able to recover if 8 of the 10 drives are OK - then add the other 2 latert.
Otherwise, it might mean working through https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID - ypu might also want to contact the experts on the raid mailing list...
Do we assume you created the raid and then used grow without making a viable backup first? If so, that is highly dangerous. You don't use raid as a backup - it is to guard against one kind of hardware failure - a drive failure. There's lots of failure modes that raid doesn't guard against such as file corruption (software problem, power drop etc), human error (deleting files by mistake), viruses and other malware, etc. With a backup the quickest way would be to create the raid again and restore from the backup, having a procedure in place to ensure your backups are complete and useable... -
Accepted Answer
Hi Tony
Let's start with the disk
[root@Zodiac ~]# smartctl -l scterc /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-514.26.2.v7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
SCT Commands not supported
Here are the rest of the information - I used /dev/md0 when i created the array
[root@Zodiac ~]# cat /proc/mdstat
Personalities :
md127 : inactive sdg1[3](S) sde1[1](S) sdf1[2](S) sdj1[5](S) sdl1[7](S) sdh1[4](S) sdc1[0](S) sdk1[6](S)
15627059200 blocks super 1.2
[root@Zodiac ~]# mdadm -S /dev/md127
mdadm: stopped /dev/md127
[root@Zodiac ~]# mdadm -S /dev/md0
mdadm: error opening /dev/md0: No such file or directory
[root@Zodiac ~]# mdadm --detail --scan
INACTIVE-ARRAY /dev/md127 metadata=1.2 name=localhost.localdomain:0 UUID=243e5d11:e1049ad5:a4a2ce43:304fdb4f
[root@Zodiac ~]# mdadm -vv --assemble --force /dev/md0 /dev/sd[cdefghijkl]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc1 is busy - skipping
mdadm: no recogniseable superblock on /dev/sdd1
mdadm: /dev/sdd1 has no superblock - assembly aborted
[root@Zodiac ~]#
I also attached the output of - mdadm --examine /dev/sd*
[root@Zodiac ~]# mdadm --examine /dev/sd*
/dev/sda:
MBR Magic : aa55
Partition[0] : 2097152 sectors at 2048 (type 83)
Partition[1] : 974673920 sectors at 2099200 (type 8e)
mdadm: No md superblock detected on /dev/sda1.
mdadm: No md superblock detected on /dev/sda2.
mdadm: No md superblock detected on /dev/sdb.
/dev/sdc:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 9e0c7861:16ceac28:516359d9:59cd656f
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : cb80fcde - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
mdadm: No md superblock detected on /dev/sdd1.
/dev/sde:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3cfbd19f:052f19e4:ef9c0132:b3537526
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : d4ff0c64 - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : eebd70e4:2ff03aa7:3e22b382:cbdc2f1a
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 212b9ea9 - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 2ec94af6:9a7ad26a:e76988a9:11d2b8b0
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : b3fb7144 - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 00162b82:1c3488bf:f1299cf6:bc353dae
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : df299b4e - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
mdadm: No md superblock detected on /dev/sdi1.
/dev/sdj:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 87366ae7:d3fcd71e:f4d5b50d:3a493d1f
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 2bd2440d - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdk:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b39843dc:4bb2d0de:c0753679:a52bed1d
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 4ad4ddea - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
/dev/sdl1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x45
Array UUID : 243e5d11:e1049ad5:a4a2ce43:304fdb4f
Name : localhost.localdomain:0
Creation Time : Mon Sep 11 21:40:47 2017
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
Data Offset : 262144 sectors
New Offset : 258048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3053a5ba:77ac33c9:0ae49712:24186eaf
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 515829760 (491.93 GiB 528.21 GB)
Delta Devices : 2 (8->10)
Update Time : Fri Sep 15 21:22:14 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 3e7bed5d - correct
Events : 30173
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
[root@Zodiac ~]#
-
Accepted Answer
As for the raid - did you stop /dev/md127 and /dev/md1 (assuming /dev/md1 is your raid device) before doing the "mdadm --detail -scan"
If not, then try again stopping both arrays first...
# cat /proc/mdstat # show us output
# mdadm -S /dev/md127 # show us output
# mdadm -S /dev/md1 # show us output (assuming /dev/md1 is your array)
# mdadm --detail --scan # show us output
if the ...scan fails, after using both 'stop raid commands', try
# mdadm -vv --assemble --force /dev/md1 /dev/sd[abcd...]1
that's two 'v's not a 'w' - show us output - where "abcd..." are all the ten correct drive letters for your raid array and assuming you are using partition 1 for raid which appears to be the case from your output... thanks
Please do ***NOT*** use the 'create' command yet - that is dangerous and ***last** resort only
Here's the output from a simulated failure
cat /proc/mdstat
Personalities :
md127 : inactive sdd[2](S) sdc[4](S)
3906767024 blocks super 1.2
mdadm -S /dev/127
mdadm: stopped /dev/md127
mdadm -S /dev/md1
mdadm: error opening /dev/md1: No such file or directory
mdadm --detail -scan
mdadm: /dev/md1 has been started with 2 drives.
Personalities : [raid0] [raid1]
md1 : active raid1 sdc1[0] sdd1[1]
1953382464 blocks super 1.2 [2/2] [UU]
bitmap: 0/15 pages [0KB], 65536KB chunk
Much good information at https://raid.wiki.kernel.org/index.php/Linux_Raid
Tony... http://www.sraellis.tk/ -
Accepted Answer
OK - let's deal with the disks first - and am concerned... This is from Seagate documentation - does give TLER (ERC) specification...
Barracuda XT drives—The performance leader in the family, with maximum capacity, cache and SATA performance for the ultimate in desktop computing
Application Desktop RAID
Cannot find a strict definition - but "desktop raid" often means Raid0 and Raid1 ***ONLY*** - why? because often they don't support TLER (ERC) - and that's a big drawback when used in Raid5 and Raid6 - search the web for all the gory details - but basically on an error occurring the drive should timeout first before the software timeout - this will cause the raid to initiate error recovery. If the software times out first (which is what happens with 'desktop' drives) - the raid thinks the disk is 'broken' and kicks it out of the array. Basically with raid you want the disk to timeout fast as this prevents 'hangs' for the users and raid can recover the data using data from the other drives to re-construct, then re-write the correct data to the 'failing' one. With 'desktop' environments there is only one copy of the data and is only that one drive - so the drive will try desperately, for ages if necessary, to recover the data - the user will experience a 'hang' while this takes place...
To test for TLER (ERC) see below - please give the results from your drives. If they do not support TLER (ERC) - then we need to change the timeouts within the Linux software disk tables...
[root@danda ~]# smartctl -l scterc /dev/sdc
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-696.v6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
[root@karien ~]# smartctl -l scterc /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-514.26.2.v7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
SCT Error Recovery Control command not supported
-
Accepted Answer
So i get the following:
[@Zodiac ~]# cat /proc/mdstat
Personalities :
md127 : inactive sdg1[7](S) sdf1[6](S) sdd1[4](S) sdc1[3](S) sdb1[2](S) sda1[1](S)
11720294400 blocks super 1.2
unused devices: <none>
and
[@Zodiac ~]# mdadm --detail -scan
INACTIVE-ARRAY /dev/md127 metadata=1.2 name=localhost.localdomain:0 UUID=243e5d11:e1049ad5:a4a2ce43:304fdb4f
[root@Zodiac ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
sdf 8:80 0 1.8T 0 disk
└─sdf1 8:81 0 1.8T 0 part
sdg 8:96 0 1.8T 0 disk
└─sdg1 8:97 0 1.8T 0 part
sdh 8:112 0 465.8G 0 disk
├─sdh1 8:113 0 1G 0 part /boot
└─sdh2 8:114 0 464.8G 0 part
├─clearos-root 253:0 0 456.9G 0 lvm /
└─clearos-swap 253:1 0 7.9G 0 lvm [SWAP]
sdi 8:128 0 465.8G 0 disk /var/flexshare/shares/torrents
sdj 8:144 0 1.8T 0 disk
└─sdj1 8:145 0 1.8T 0 part
sdk 8:160 0 1.8T 0 disk
└─sdk1 8:161 0 1.8T 0 part
sdl 8:176 0 1.8T 0 disk
└─sdl1 8:177 0 1.8T 0 part
As you can see, the OS drive is now sdh and i do not see any disk showing up as raid here.
Any way to rebuild the raid?? -
Accepted Answer
Hi Tony / Nick
w.r.t the Network, the 2nd NIC is only "missing" when the system goes into emergency mode, i am presuming that the OS does not get that far to load the driver???
Here is the info on the Network ports:
00:19.0 Ethernet controller [0200]: Intel Corporation 82578DM Gigabit Network Co nnection [8086:10ef] (rev 05)
Subsystem: Intel Corporation Device [8086:34ec]
Kernel driver in use: e1000e
Kernel modules: e1000e
--
02:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Con nection [8086:10d3]
Subsystem: Intel Corporation Device [8086:34ec]
Kernel driver in use: e1000e
Kernel modules: e1000e
My system is a Intel S3420GP with standard Xeon 3450 and 16GB of RAM
The board had 6x SATA port and i have 2 PCIe cards to give me an additional 6 SATA ports.
No hardware Raid is enabled.
The disks are all from a Qnap NAS that the motherboard failed:
8x seagate barracuda XT 2TB and 2 more of the same that i purchase to make the 10x
Power supply is a 850W Gigabyte PSU
Here is my fstab file - Thank you for pointing out that i need to "#" the Raid map in fstab to boot normal.
#
# /etc/fstab
# Created by anaconda on Thu Sep 14 21:04:41 2017
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/clearos-root / xfs defaults 0 0
UUID=0c365059-ad79-431a-8486-2a83ac85eb67 /boot xfs defaults 0 0
#UUID="3e796bd4-d628-43ad-a4ba-f6c6c7f93710" /store/data1 ext4 defaults 0 0
/dev/mapper/clearos-swap swap swap defaults 0 0
# Mount Drives by Label
LABEL=Torrents /mnt/Torrents ext3 defaults,noatime 0 0
/mnt/Torrents /var/flexshare/shares/torrents none defaults,bind 0 0
#/store/data1/Movies /var/flexshare/shares/movies none defaults,bind 0 0
#/store/data1/Series /var/flexshare/shares/series none defaults,bind 0 0
I still get this feeling that on reboot, that the drives - /dev/sd[a-l] - do not stay in sequence.
I have seen that the boot drive is /dev/sdc and after i add or remove a non-raid drive it will be /dev/sdg.
To me this means that the raid gets broken, they are not /dev/sdc to /dev/sdk, and that is why it is failing and going into "Emergency" mode? -
Accepted Answer
-
Accepted Answer
Good point Nick - since the OP was putting the system in a rack one would hope this is a decent server that can support 10 drives properly
Examining the output of the command below should confirm if all drives are accessable...
# fdisk -l /dev/sd[a-j]
Leon - tell us what the system hardware is like please... motherboard, raid/disk controllers and PS especially.
One thing worrying thing is the comment...
The other thing i noted is that my 2nd NIC is also "dead"
Was this resolved? - otherwise could be looking at all sorts of other problems - BIOS, missing interrupts etc -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
No that is not normal Sounds as if the raid array is not being assembled on boot.
I assume here that the raid is only for data and no system directories reside on it...
if so - the following should get you going... (assuming an assemble failure)
When in emergency mode type your root password to get in.
Edit /etc/fstab and put a comment ( # ) in front of the line for the raid filesystem
Reboot - the system should come up without the filesystem on the raid
# cat /proc/mdstat
(please save a copy of this to review later..
e.g. cat /proc/mdstat > /root/mdstat_on_boot.txt)
suspect it will have an entry re. md127
if so
# mdadm -S /dev/md127
for good measure in case it is stuck
# mdadm -S /dev/mdx
where "x" is your array number
ensure you have a valid mdadm.conf - here is the complete file for one of mine as a example
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md10 metadata=1.2 name=karien.sraellis.com:10 UUID=41cd67d5:98593c51:a02a59db:aaa91e8e
then
# mdadm --assemble --scan
you should get a message that your array has started
# cat /proc/mdadm
looks good?
remove the comment you added to the /etc/fstab file and mount
# mount -a
all being well the filesystem should mount - then requirement to determine why raid array is not being assembled on a reboot...
There are of course other failure methods - just picked the most likely...
How about posting your fstab and mdadm.conf files together with the output from "cat /proc/mdadm"
Also "mdadm --detail /dev/mdx" where "x" is your array number when you get it going...
Also before rebooting - comment out the raid filesystem line in /etc/fstab until you have resolved the problem. This may save the "Emergency mode"after a reboot. You can tell from /etc/mdstat after reboots if you have solved the problem, then you can leave the line active in fstab. Another solution is to leave it commented out until the problem is fixed, and in the meantime start the filesystem from the command line after you have manually got the raid going correctly..
e.g. mount -o noatime /dev/md1 /data - substitute your own values...
Edit - fixed the odd typo - **verify** my commands before using... -
Accepted Answer
-
Accepted Answer
So i could not get ClearOS to boot normally, did not matter what i did, it went to the recovery console.
So i did a clean install again with only the OS drive connected, setup ClearOS to the point where the network is configured and you have to re-boot.
I then connected the 11 other drives and re-booted.
During going thru the update and installing the apps i checked with putty and the Raid was re-building itself - 1200 odd min later - Raid6 fully on line.
I am now growing the Raid6 to 10x 2TB drives, that is going to take some time....
Once that is done i shall see how to release the 5%, about 1TB of space and i should have about 14TB of Raid6 storage.
As this is my 1st dive into Software Raid, i have come to the conclusion that Raid and non-Raid drives on the same system is NOT for me. -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Hi Nick
What i mean is that if i run cat /proc/mdstat, it show md127: active raird6 sda[1] sdb[2] sdd[5] ..... you get the picture
Now when i remove or add drives, sda or sdd might not be part of the raid anymore as they have now been assigned to the other non-raid drives.
Is their a way to use the raid drives UUID for mdadm, as the UUID never changes.
In fstab i have a UUID, but this is not for the individual drive, but for the UUID that mdadm gave me when i made the raid array -
Accepted Answer
By specifying UUID in mdadm.conf would have thought /dev/sdx changing would not be a problem - i.e. a mdadm.conf similar to below - the disks are scanned for the UUID to start the array, regardless of the /dev/sdx values... First 4 arrays are raid 1, the last raid 5, example from one of my machines
ARRAY /dev/md0 metadata=0.90 UUID=f0511672:1c4c78a6:0d13d201:d8f86b49
ARRAY /dev/md2 metadata=0.90 UUID=9f28af19:dd735a97:f0090154:079827c8
ARRAY /dev/md3 metadata=0.90 UUID=5682de5a:98ebd29a:df9ea576:ef6ee481
ARRAY /dev/md4 metadata=0.90 UUID=de20142e:4f6ad390:17d8605d:b2c36998
ARRAY /dev/md10 metadata=1.2 name=danda.sraellis.com:10 UUID=5006cf00:449f8311:2bbdec12:b86c4ef1
an alternative file that also avoids /dev/sdx entries...
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=f0511672:1c4c78a6:0d13d201:d8f86b49
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=9f28af19:dd735a97:f0090154:079827c8
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=5682de5a:98ebd29a:df9ea576:ef6ee481
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=de20142e:4f6ad390:17d8605d:b2c36998
ARRAY /dev/md10 level=raid5 num-devices=3 UUID=5006cf00:449f8311:2bbdec12:b86c4ef1
# mdadm --detail --scan
shows what could be in your mdadm.conf
# mdadm --detail /dev/mdx
will provide the UUID for a single array and the current /dev/sdx assignments amongst other information...
/dev/md10:
Version : 1.2
Creation Time : Tue May 5 11:30:08 2015
Raid Level : raid5
Array Size : 1845515264 (1760.02 GiB 1889.81 GB)
Used Dev Size : 922757632 (880.01 GiB 944.90 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Sep 15 02:41:29 2017
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : danda.sraellis.com:10 (local to host danda.sraellis.com)
UUID : 5006cf00:449f8311:2bbdec12:b86c4ef1
Events : 15590
Number Major Minor RaidDevice State
3 8 54 0 active sync /dev/sdd6
1 8 38 1 active sync /dev/sdc6
2 8 24 2 active sync /dev/sdb8
Can you post your fstab and mdadm.conf files...
EDIT: - just saw this update
I suspect that not everyone has a raid system with non-raid drives.
Not everyone, but at least there are some systems here with a mixture of software raid and non-raid disks :-) -
Accepted Answer
I'm not sure what you're aiming for. The safest way to mount drives in /etc/fstab is by UUID and it is easy to change the entries round. If you do that the best thing to do is reboot for it to take effect. "mount -a " may pick up the changes, but I don't know. If it does not and you don't want to reboot you'll need to unmount the drives first before giving the "mount -a" command.
I'm afraid I don't understand this bit:
Is their a way to exit the raid info to mount by UUID instead of /dev/sdx?
-
Accepted Answer
Hi Nick
I checked, and yes, i do not have any non-raid drives mapped in fstab
What i did see is exactly what i expected the issue is:
When i change - remove or add non-raid drives - ClearOS does NOT assign the same /dev/sdx id to the drives.
I suspect that this is the issue.
I did query this before but was told that you mount the Raid by it's UUID.
I suspect that not everyone has a raid system with non-raid drives.
Is their a way to edit the raid info to mount by UUID instead of /dev/sdx?
That way it does not matter what Linux do. -
Accepted Answer
-
Accepted Answer
Please login to post a reply
You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.
Register Here »