Nick Howitt wrote:
For LE, it is in the documentation for LE - https://documentation.clearos.com/content:en_us:7_ug_lets_encrypt#replace_the_self-signed_certificate_for_webconfig. The setting is not backed up, as you say. I wonder if it can be safely but the restore program would need extra functionality to restart the webconfig.
For e-mail, if you use cyrus-imapd, all mail is under /var/spool/imap and /var/lib/imap. The raw mails are under /var/spool/imap and there is a database and other stuff under /var/lib/imap. These can be copied/rsync'd/tar'd across, but it will do everyone at the state of the last mails. If your e-mails have moved on since then, you have a bit of a pickle. You can possibly copy any new e-mails for the user as a backup, can copy in all the old e-mails under /var/spool/imap, but they will all appear as unread and you won't see any where you have replied. Then the new ones you've backed up can probably then be copied in as well, but make sure the files don't duplicate. Be careful manipulating files as the naming is odd as they all end with a . and things don't always go as expected. I think if you edit them you initially lose the trailing ".". Make sure you also copy in the cyrus.* files in each folder or you will have to run "reconstruct" on the mailbox.
I wanted to close this adventure off for now. I change the self signed cert as per the link provided. No more messages! Thanks for the link! I'm not sure why I didn't find it myself!
I gave up on the cyrus email extract and told the individual in question that the email was lost. It's a bit of a cop-out but I ran out of patience.
As always, thanks for all your help, Nick. Much appreciated!!
Nick Howitt wrote:Thanks Nick.
I have successfully moved a disk between servers. It is how I did my 7.x installation, but note that all your network interfaces may (will) change so check the files you will need to review in the Config Backup docs. The other thing is the two servers will have to boot the same way, either UEFI or BIOS.
Note, if doing a disk recovery, best practice says always do it on a clone of the disk and never on the disk itself
I have reinstalled on the new drive and restored all the settings via "Configuration Backup and Restore". It was mostly OK. There were issues with the Let's Encrypt certificates, flexshare, fail2ban, mounting our nas and virtual websites. I think those are fixed now.
I have two outstanding issues.
The reinstall seems to buggered up the certificate for the webconsole. It is now a localhost.localdomain. It should be using my server Let's Encrypt certificate. I can't figure out where to change this. It doesn't appear to be in the HowTo unless I missed it.
Second is not unexpected but frustrating ... Even after I told everyone to download their email, one family member didn't. So the second questions, is what would you use as search terms to figure out how to rescue the left over mail that are on the old server disk?
Nick Howitt wrote:
There should be no reason that I can think of that getting the chroot line wrong would stop you being able to boot again from USB/DVD.
The initramfs message could be interesting. There are plenty of references for generating a new one. Google "generate initramfs centos 7". This looks interesting.
I don't think /boot is in the LVM. I think it is a native partition in its own right. I am not sure you can copy a /boot from one drive to another as it has the drive partition references in it, but I am not sure.
Otherwise, especially if you have a significant user set up or OpenVPN certificates deployed, I'd go for option 2 over 3, but note that after you do a system restore, I'd delete and recreate websites and flexshares as a restore on its own does not generate the folders or the bind mounts in /etc/fstab. If you can still mount your old disk you may be able to recover those and the rest of your data. They should be fast to copy over if you have both disks mounted in the same machine. From /etc/fstab, don't copy over the whole file or you will really mess things up, just copy the bind mount settings.
Thank you Nick.
You are correct. The /boot is an ext2/3/4 partition and is not in the LVM.
The second partition on the hard drive is the LVM volume. It appears that the cloned /boot has defective sectors or corrupted file(s). I suspect that is why it won't boot up properly.
I've read the info about initramfs and will give that a try but I think I need to try a two pronged approach.
I still have my old server box with similar hardware but one generation older. The new server is build the same just with a newer CPU and more RAM. Everything else is essentially the same.
Would the following work?
Could I install the current COS version onto my old server, do the restore of settings and if everything runs OK, just swap the hard drive to the new server? This way I wouldn't touch the existing server hard drive again until I do the swap out. I think I reduce the risk that sectors used on the hard drive sector fail in start-up before I've got a rebuilt server.
Is that advisable or are the auto identification of hardware in COS going to have a problem with the switch?
Thanks again for all the advice!
Patrick de Brabander wrote:
Beside the tips from Nick and other methode, you can also use this tool.
I've done this many time with my COS disk to make a complete clone.
Use a External Hard Drive Docking Station like this (for example)
It clones HDD without a PC.
Thank you Patrick. That is a cool piece of gear. I'll be having a look on the weekend. That would certainly save some effort.
Nick Howitt wrote:
For disk repair, if TestDisk won't do it, I think you have to go professional.
Thank you Nick and Patrick. I might try the clonezilla again. I don't have a Window pc so unfortunately I can't try MiniToolShadowMaker.
I used Testdisk on the drive and the partitions look to be OK. After the "analysis", Testdisk looks to have added the * boot to the right partition. After running Testdisk, I switched out the new drive into the server but it still won't boot.
I suspect there is a missing link to the images or file or grub is buggered. When doing the ClearOS Recovery from the installer there was a part about "chroot" that I think I should have added something on the cmd line but it wasn't clear in the instructions and then I couldn't get the server to boot from the DVD or USB again. I've decided that Dell BIOS is a PITA.
Grub shows the list of kernels but when you select any of them you drop to dracut with a message about initramfs...
(If I knew more, I suspect this would be relatively easy to figure out but I'm a bit of a novice.)
This leaves me with a couple of questions since I can't figure this out.
1) lvm is supposed to be able to do live-migration of data. I've started to read the Redhat documentation but it is a book for just lvm.
I think I should be able to create a snapshot of the boot directory etc that works on the failing drive, and copy/migrate that to the new drive.
Does this make sense or am I looking for lvm to do something it isn't set up to do?
2) I've done the backups of all the data and settings. I've got daily backups from "Configuration Backup and Restore" for years. The /home is on a separate drive. All email are POP so everyone has their email locally. The drive that is starting to have problems contains only the COS system (and mail & websites).
If I do a fresh/new install on the new drive, can I boot to the webconfig and restore all the settings using "Configuration Backup and Restore"? Would this be less futzing around and get up and running again relatively quickly? I expect I'll have to put back a copy of the websites but that shouldn't be an issue. I'm not sure about the mail server but with the settings files and no mail stored on the server that shouldn't be too big a chore?
3) Should I be rebuilding the complete server from scratch and manually set up everything?
Thanks again for your help and suggestions.
Over the past week, logwatch has shown a kernel error. When I look back in the log I see it started a few weeks back. One day an error, then nothing for weeks. In the past week had 2 errors so time replace the drive.
After many attempts, I can't get the replaced hard drive to boot. I believe it is missing a link but not sure how to diagnose. Any suggestions appreciated.
Here is a summary of my attempted fixes ...
Here is an example of the kernel errors.
I installed smartmontools and ran it. The system disk is getting some errors. Decision time to replace the hard drive.
Normally I'd do a fresh install but have added a bunch of customizations that I didn't document. So I thought it would be easier to just clone the drive???
Sure enough there were posts in the forum recommending Clonezilla.
The clonezilla failed the first time. I figured that must be due to the media errors that the kernel error mentioned. So I tried it again with the --recover and fsck options. Second time was successful.
Then I replaced the drive in the server and rebooted. :-(
I ended up seeing the grub menu and selecting the image but then dropping to dracut and getting "Entering emergency mode. Exit the shell to continue. Type 'journalctl" to view system logs. You might want to save "/run/initramfs/rdsosreport.txt" to a usb stick after mounting them and attach it to a bug report."
Completely embarrassed as I couldn't figure out how to get the USB stick mounted to copy the file to add here. No /mnt and no ability to add a /mnt.
Rebooted using ClearOS installer to try the recovery. The recovery got through to the grub menu and when selecting recovery it dropped to the dracut, Same happened when I selected any of the other kernel images.
I tried to find the volumes to make sure they are there:
xfs_repair the lvm images but clearos-root appears to be missing the "superblock"?
Put old drive back. Boots up and back working but not fixed. :-( Help.
As of Thurs. Mar 11, I've been getting the following message from the sa-update job for SpamAssassin
What I've read in posts from around 2014 is that the fix is to comment out this mirror as it is no longer updated?
If this is the case, then why did this error show up 4 days ago? Was there an update which enabled reading from the dead mirror?
Thanks in advance for the clarification.
Nick Howitt wrote:
Are the failures for valid users in the pam messages? If so the short answer is don't worry about them. You will probably get them every time a user logs on. For the long answer see https://www.clearos.com/clearfoundation/social/community/pam-unix-authentication-failure and similar threads. I use the file in this post in a different thread. The issue is that the authentication mechanism tries against unix accounts first and reports an error if it fails and then it tries against ldap accounts. All cyrus-imap users are ldap users.
Thanks Nick. That is interesting and seems logical.
I've read through the threads and need to do some background reading. I'll try this on the weekend and see if I can implement the fix.
Out of curiousity, did anyone every fix the issue in the bug report?
Nick Howitt wrote:
So it uses the name imapd for port 993 but cyrus-master for the IMAP and POPS processes. Strange programming! IMAP is listening on localhost only. I've no idea why that would be but let's assume it is correct.
As you are not listening externally on POP/IMAP, I'd assume you logwatch report is grouping POP and POPS together and reporting them as POP. Ditto IMAP and IMAPS, but only you can crosscheck that. A quick grep of failures in the maillog for one day may prove that.
Sorry I've had no time to look at this for a week.
The number of failures is still in the hundreds.
But the grep of failures in maillog shows only 21 for the day on imaps. All failures are captured in fail2ban and the IP addresses are banned.
I'm scratching my head and can't figure out why logwatch is still showing 717 failures over 5 users on the same day.
If the failures aren't in maillog, then where could logwatch be picking up the failures?