Forums

Resolved
0 votes
Friends,

I'll try to keep it short. I tried to separate a RAID 1 array into two drives using procedure I found online. Basically: unmounted array, stopped array, deleted RAID superblocks, changed partition type to linux (83), deleted array.

I found several articles telling me how to do this, but it left me with two drives that won't mount - they are unrecognized and throw errors. Apparently there is more to the procedure that what I read about. Anyway, I tried to restore a copy from a backup using DD, and by some means I still cant fathom, erased the backup. I can confirm the DD direction was definitely correct.

I now have one intact but erased ext4 drive, and one of the two non working former RAID drives. A data restore company quoted me the $500-$2500 range, and I'm a home user...that won't fly. Clearly in hind sight, I should have just removed a drive and let it run degraded. But I got in over my head as I often do, I'm not a professional.

Can any one suggest how I can get the RAID drive readable again?

I'd like to also try "unerasing" the apparently erased drive, but when I installed photorec on COS7, the server stopped talking on the external interfaces, and I don;t know how to recover it! I'm really in a bind.

All help kindly appreciated,
Drew VS
Thursday, October 18 2018, 04:56 PM
Share this post:
Responses (26)
  • Accepted Answer

    Thursday, October 18 2018, 05:38 PM - #Permalink
    Resolved
    0 votes
    Check out TestDisk. It is available on plenty of Live CD's. Best practice says take the drives off-line and really you should work on a copy made with something like DD in case your first recovery attempt fails. Then you can always have another go.
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 01:42 AM - #Permalink
    Resolved
    0 votes
    Man - that was the worst advice ever...

    The only simple way I can think of is to create another raid 1 array from the drive with data...
    Then :-
    1) Get yourself another drive (we will call "work") the same or larger capacity as the "one of the two non working former RAID drives" (we will call "original").
    2) Use "dd" to make a copy of the "original" to the "work" drive (make sure to get that the correct way round :(
    3) Put "original" away in a safe place - do **NOT** write to it
    4) We can now 'play' with work, and if unsuccessful - go make a copy again from "original"
    5) Create a raid 1 array on "work" - except **one** drive...
    it will look something like this (example only - you will have to use YOUR parameters
    # mdadm --create /dev/mdX -l raid1 -f -n 1 /dev/sdXY
    or create a two drive raid 1 and adding the "missing" parameter for the non-existent drive
    6) Use mdadm to see if it looks reasonable
    # mdadm --examine /dev/sdXY
    # mdadm --detail /dev/mdX
    change partition type back to "fd"
    7) try starting the array and look at # cat /proc/mdstat
    8) If it appears OK then try mount ****READ _ONLY*** initially and see what you have..

    You can also use the technique of using dd to copy to another drive and try Nick's suggestion...

    NEVER NEVER EVER in this situation do anything to WRITE to the only drive you think has the data possible to restore...
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 01:49 AM - #Permalink
    Resolved
    0 votes

    erased ext4 drive

    Did you do more to this drive than that as detailed in your first paragraph?
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 09:14 PM - #Permalink
    Resolved
    0 votes
    Thanks Guys,

    Tony - I am trying to do exactly as you say. I ran a dd overnight (yes, always paranoid about s/d order!) and it was not done in 12 hours - a 2 TB drive. That seems too long? With no progress indicator in DD, its hard to tell. Worse yet, since the server will no long boot (data drive fails check), it comes up in read only mode and I can't edit fstab. How the heck do you get around that? (please be gentle with the newbie). This is my old v6.9 server.

    So, if I can get dd to work (should be easy right?) then I had planned to proceed just as you say. But I am very happy to have my thinking confirmed! Roger on the never-ever. I have one drive untouched. I have some experience here, but not on the linux side.

    Any idea how long DD should take for two 2 TB SATA drives? Decent 7200 RPM drives but one is on a SATA 1.5G interface. Can I get a "dd with progress" update somewhere for COS?

    Nick - I burned a whole pile of various rescue CDs today and one of them is Test Disk. The tools are good, but I don't have the skills to know what to do with them. Can I boot from a USB connected CD drive with these?

    Another odd problem. I installed photorec on my new 7.5 server to also help deal with this, and while it still boots, external interfaces (GUI and telnet) are dead. Why would this be and how the heck do I get to command line to uninstall it? So now...neither of my servers functions quite right.

    I seem to be digging myself deeper due to lack of experience. Learning, but &^%$ing up.

    Thanks,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 09:21 PM - #Permalink
    Resolved
    0 votes
    Tony,

    Regarding the ext4 backup drive. I had not erased it, I just found it empty after I tried to dd from it to one of the other drives. I can confirm that I DID NOT get the s/d order wrong. However, I am suspicious that perhaps I left the source drive mounted during the dd? I know that's a no-no, could that have done it?

    After this happened, no, I did NOTHING else to that drive.

    Thanks to you gents for all your responses. I feel that there is still a solution here but I am a bit gridlocked with contributing problems I don't quite know how to deal with. If I can get DD to work, and the servers up, the RAID rebuild seems like it has a good shot.

    Drew VS
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 09:36 PM - #Permalink
    Resolved
    0 votes
    google "monitor dd". There are a couple of solutions. I think it is a slow process.

    I've used TestDisk before but years ago. I think their web site has a huge amount of info on it.
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 09:46 PM - #Permalink
    Resolved
    0 votes
    Boot from a live linux CD or USB drive, create a directory in the 'live' file-system and mount your ClearOS OS root drive. You should then be able to edit /etc/fstab. Alternatively you can boot from the ClearOS install media and enter rescue mode and proceed to mount the ClearOS file-system, then edit that file...

    A slow Intel Atom CPU with a SATA 1.5G interface drive should be able to sustain about 50 MB/sec. So there's a little maths exercise for you for an approximate maximum time for the dd copy...

    There are a few things you can do to help with dd such as running two copies at once, one doing the reading and the other the writing. Also an increase in the block size *bs" from the default 512 bytes and the use of flags to avoid a memory copy...

    This is an example of what is used here...

    # dd if=/dev/sdx bs=4M iflag=direct | dd of=/dev/sdy bs=4M oflag=direct
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 10:23 PM - #Permalink
    Resolved
    0 votes

    left the source drive mounted during the dd

    Have no explanation - cannot explain how dd would write to the source drive... ###

    Always boot from rescue media here to use dd copy and other rescue operations - so "mounted drives" etc are never an issue... also allows fsck to be run...

    Should probably have added a "fsck" in *READ-ONLY* mode to the list of checks when you think you have restored the array - *IMPORTANT* do it before mounting...

    ### Actually thought of a couple just now before the save...
    Did you used fdisk, smartctl of something similar to verify which disk is which while the system was booted before the dd copy?
    1) with sata drives - the drives move when a drive is missing..
    so say you had three drives - one with clearos (sda) and two in a raid array (sdb and sdc)
    If sdb is removed - then sdc will become sdb and there will be no sdc with no other drive added.
    However, depending on which sata connector you used to add the introduced drive (assuming you have 4 or more sata ports) - then there are two possible sequences assuming sda is not touched...
    e.g.
    a) sda (clearos system on SATA0) - sdb (was array sdc on SATA2) - sdc (new clean drive on SATA3)
    b) sda (clearos system on SATA0) - sdb (new clean drive on SATA1) - sdc (was array sdc on SATA2)
    requires thinking about :)

    b) This is an odd one and unlikely - have a machine that if you check the BIOS - the drives are in one order, when booted in ClearOS the drive order has been changed...

    Edit: The change in drive order occurs because there are two different types of sata controllers - the BIOS puts then in one order - linux sees them in the reverse order...
    The reply is currently minimized Show
  • Accepted Answer

    Friday, October 19 2018, 11:00 PM - #Permalink
    Resolved
    0 votes
    Thanks Nick, I will nose around. I've seen the monitor dd solutions and they all seem to have issues. I' rather get the updated version, or one of the dd replacements, like dcfldd.

    Tony, I also had those worries and was careful to check the drives after I added them. I could tell by the info which was which. You don't think mounting did it?

    Based on your info, overnight is marginal. I'll use try a rescue disk.

    Suddenly, the COS 7.5 system is working again. My life is confusing!

    Thanks,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 20 2018, 07:56 AM - #Permalink
    Resolved
    0 votes
    I used the "kill -USR1 $(pgrep ^dd)" method but I used the process ID instead of "$(pgrep ^dd)". I can't remember how I got the process but it was not difficult - perhaps "ps aux | grep dd". t looks like ClearOS 7 has pgrep so that would be easier. Using Tony's 55MB/s you get 10h. It is not a quick process. When backing up a machine, even a Gb LAN only manages 100-120MB/s and that is if you can saturate your disk which you often can't with small files.

    Anyway, if it is working great, but get a backup quickly.
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 20 2018, 11:19 AM - #Permalink
    Resolved
    0 votes
    Nick - I only mean the (bare) COS 7.5 machine is up again. No data solution yet.

    I found that COS 7.5 is using the newer dd that has status=progress. So that issue is solved. I ran a dd overnight and now I see the issue. The total copy time for the 2 GB drive is....wait for it....66 hours! Yikes... I did not realize the CPU was such a bottleneck.

    Tony,
    I reran dd using your method with large blocks and the direct flag, but with status=progress added. Wow! It went from 8 MB/s to 120 MB/s. Now we have something workable, under 5 hours! Much closer to the SATA interface throughput limit, about a Gb/s. I'm very happy to be able to try something this weekend.

    As always, you guys are life savers. I seem to go through this about every 5 years for an upgrade. I need to email you both a beer! It's nice to see two of the same familiar faces from 5 years ago. Just missing Tim Burgess...

    Thanks,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 20 2018, 12:41 PM - #Permalink
    Resolved
    0 votes
    Drew - happy to be of some assistance - let's hope the data recovery is successful..

    Also thanks to Nick - and you Drew for your testing - was unaware of the dd enhancement "status=progress" and will add that option during any future use of dd...
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 20 2018, 06:26 PM - #Permalink
    Resolved
    0 votes
    Gents,

    The backup is complete. I installed (not mounted) the drive and confirmed that it is /dev/sdb. It has one partition, sdb1. I ran this command:

    # mdadm --create /dev/md0 -l raid1 -v -f -n 1 /dev/sdb1

    mdadm gave some information, asked to proceed, and then completed without comment. I then commanded:

    # mount /dev/md0 /home

    This gave a pile of errors:
    exp4_check_descriptors: Block bitmap for group 0 overlaps superblock
    exp4_check_descriptors: Inode bitmap for group 0 overlaps superblock
    exp4_check_descriptors: Inode table for group 0 overlaps superblock
    Checksum for group 0 failed (638!=0)
    Group descriptors corrupted!
    mount: wrong fs type, bad option, bad superblock on /dev/md0,
    missing codepage or helper program, or other error

    Does any of this make any specific sense? Any ideas?

    Thanks,
    Drew VS
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 20 2018, 08:30 PM - #Permalink
    Resolved
    0 votes
    Gents,

    Seemingly simple things continue to vex me. I am trying to get the COS6 system running (the one that won't boot due to the bad file system mounted as /home in fstab) and I need to edit fstab. I booted a USB stick with Knoppix. The system drive is LVM, which I'm not used to. So, using lvdisplay I found:

    VG name is vg_oak
    LV name is lv_root

    So, I ran:
    # sudo mount /dev/vg_oak/lv_root /mnt

    But the system (Knoppix 7.1) responds:
    Special device /dev/vg_oak/lv_root does not exist

    What am I doing wrong?

    Thanks,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 20 2018, 08:36 PM - #Permalink
    Resolved
    0 votes
    There are so many articles with errors out there! Found I had to run vgchange -ay. OK, the moutnign part is all OK now, thanks.

    Back to data recovery.
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, October 21 2018, 08:48 PM - #Permalink
    Resolved
    0 votes
    Friends,

    I've got both systems all booting and running, etc, but I am having no luck at all with data recovery. The RAID recreate techniques always end up failing on mount, noting that the file system is wrong or has a bad superblock. I have a number of boot rescue tools, but none is sufficiently automatic to help me. Online articles seem to suggest that when moving from RAID to non-RAID, that the offset moves and I have to move the partition to match so the system can find the superblocks. All the examples given assume you will know those offsets before the superblock was zeroed. Of course, I don't have that.

    This is especially frustrating, since I know I only deleted RAID superblocks and thus the data must be there, right? The online help seems to be often wrong and untested.

    Of course, I am backing up the original erring drive and using copies. But I have no other avenues to explore except a ***OUCH*** commercial retrieval service. Does anyone have any alternatives? Of course, I am willing to pay anyone that can truly help.

    Many, many thanks,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, October 21 2018, 09:09 PM - #Permalink
    Resolved
    0 votes
    Did TestDisk give you anything? It looks like they have a reasonably active forum.
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, October 21 2018, 09:51 PM - #Permalink
    Resolved
    0 votes
    Nick,

    The problem is more along the lines of understanding anything it tells me. It's not an automatic recovery tool, so what do I do with it? It can find partitions, suggest superblocks, but I have no idea what actionable steps any of it suggests. I could not find any procedure that was clear for restoring an erased RAID superblock.

    I suppose I can ask the same questions on their forum, but a review of what is there didn't seem to help me. I'm not clear on the nature of what I am trying to do...I have a bad file system, but it is related to an offset in the partition? All these tools assume more file system knowledge than I have.

    Thanks,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, October 21 2018, 10:10 PM - #Permalink
    Resolved
    0 votes
    Drew,

    Did you try a e2fsck? if that fails (while NOT mounted) try the following with a number of possible offsets... where y is the partition number - probably 1

    # e2fsck /dev/sdxy?offset=2048

    An alternative would be to write a small script with offset as a variable, and grading increasing it in a loop while testing with e2fsck, checking the return code, and stopping on success. You then need to modify the partition table...

    yes, deleting the raid super block was the worst move...

    The raid experts (i.e.those who develop and maintain mdadm) hang out in the raid-linux newsgroup http://vger.kernel.org/vger-lists.html#linux-raid
    They also have a wiki if you have not found it... https://raid.wiki.kernel.org/index.php/Linux_Raid
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, October 21 2018, 11:57 PM - #Permalink
    Resolved
    0 votes
    Yep, did the e2fsck alternate superblock thing early on - no dice. If you Google around, you'll find tons of articles to erase that superblock. Nice eh?

    Thanks Tony.,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Monday, October 22 2018, 02:11 AM - #Permalink
    Resolved
    0 votes
    Hmm... Just did a web-search re. erasing the raid super-block and the few few sampled were all about erasing the raid super-block as it was interfering with their ability to create a new raid - they were not wanting to preserve data but start from scratch with a clean fresh brand new array, and as such correct information... you obviously had a different search and managed to find very bad advice.

    As you indicated - much on the web re. recovering a software raid array is poor to downright disastrous - some because it is way out of date - others just plain wrong. The only one i personally would use, with caution, is the wiki referenced in an earlier append - that is put together by those participating in the official linux-raid group and is more sound...
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 23 2018, 02:45 PM - #Permalink
    Resolved
    0 votes
    Hi Drew - was just enhancing a little script used here to record raid array data, and checking the output a thought occurred to me...

    Are you using the same version of mdadm to create the one drive raid as was used to initially create the original two drive array? I think maybe you are using a live distribution...

    Reason for asking is that some time in the past some of the defaults such as the metadata superblock version (0.9, 1.0.1.1 and 1.2) for some raid types changed and not sure if they are the same for ClearOS 6.x, ClearOS 7.x or any live distribution you might be using in your attempts to recover the data. You need to match the superblock version with what was used originally to create the array, using the "--metadata=" option if the default in what you are now using does not match the original value. Suspect you used the default? - i.e. did not specify it, used whatever ClearOS provided?

    Sorry, but cannot tell you what the ClearOS defaults are since raid arrays are created manually here so 1) the exact command to create them can be recorded - and 2) sometimes use the whole disk without any partitions i.e. /dev/sdb rather than /dev/sdb1 (yes raid supports that)...

    Don't have a running ClearOS 6.x at this moment to check either as to what the defaults are for it...
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 23 2018, 03:55 PM - #Permalink
    Resolved
    0 votes
    Tony,

    Agree on your comments, including your *very* polite implication that I may have followed advice for the wrong goal. I do not disagree.

    On the mdadm version, I was vaguely aware of that issue, and so I used the live distributions only to get the system bootable and after that I used the COS 6.9 mdadm itself to do any recovery trials. However, I am not sure that I did not initially create the array even before COS 6. It may have been created in COS 5. Probably not earlier than that as I believe I was using RAID 5 under ClarkConnect.

    So, problems as you describe may be possible, but I don't know what metadata would have been used at creation in order to specify it.

    In any case, I did find a reasonably priced recovery service ($300 US) with good reviews, so I have sent a drive in. I still have a copy of it and can make another if there are other things to try. But it may be best for me not to try total "shots in the dark" without some understanding of where I am going. I will know in future to record a LOT more information about details of the array! I had no idea of all the variables and changes. (As well as major changes in my backup strategies and recovery procedures....duh!)

    Many kind thanks,
    Drew VS
    The reply is currently minimized Show
  • Accepted Answer

    Friday, November 02 2018, 10:24 AM - #Permalink
    Resolved
    0 votes
    For all the kind people helping, Tony and NIck particularly...I am just waiting to hear from the recovery service. Should be today. I'll pass along my feedback on them. Price is good and they have been responsive. Now just need the results.

    Cheers,
    Drew
    The reply is currently minimized Show
  • Accepted Answer

    Monday, November 19 2018, 05:18 PM - #Permalink
    Resolved
    0 votes
    Friends,

    My drive was returned on schedule and the data recovery was about 98-99% I was amazed. I can highly recommend the company "$300 Data Recovery".

    Drew VS
    The reply is currently minimized Show
  • Accepted Answer

    Monday, November 19 2018, 10:21 PM - #Permalink
    Resolved
    0 votes
    Great news - you must be highly relieved :D
    The reply is currently minimized Show
Your Reply