Running a system off of one hard drive is just asking for trouble. Hard drives are one of the most likely components to fail in a system. If your system is running off of a single drive, and that drive fails, the results could be devastating. Even with a good, current backup, if a drive dies, you system will be down until you can figure out how to reload the OS and restore your data. Software raid is a good solution for ensuring that your system remains available when a drive decides to fail. I’ve personally had more success dealing with failures with software raid than I have with any hardware raid products.
Ideally, the software Raid array will be configured when a system is installed. There are, however, a number of situations where you might have to migrate a running system to a raid array. Doing so is relatively risky, but certainly do-able. In my specific situation, I have a client with a new server at ServerBeach. The server came with two identical drives, but for some reason ServerBeach won’t install the OS to a software raid array (probably because they use some pre-built images). The server has the OS installed to the first drive, and the second drive is completely blank.
There are a few howto’s for getting this working, and from which I took bits and pieces to make this work for CentOS5:
First, an overview of the process:
1- Boot off your running system, Install mdadm, and copy the partition tables to the blank drive
2- Create your raid devices using just the blank drive (the raid will be running in degraded mode since the main drive is unavailable since it is still being used for the main OS)
3- Copy your working filesystem to the mirrored drives
4- Configure Grub to boot from the mirrored drive
5- Reboot onto the mirrored drive, ensure everything works.
6- Add the initial drive into the raid array to bring it online
7- Configure grub on both drives so that if one fails, the other will boot.
Falco’s guide does a very good job of walking through the whole process. I followed it, and would recommend it with just a few changes.
1- I don’t see any purpose in having a RAID1 swap partition. Make this RAID0 or just enable two independent partitions without raid.
2 – Don’t edit /etc/fstab and /etc/mtab on the live, working system. Edit those on the mirrored drive after the filesystem has been copied over. This will leave the working system functional if you need to fall back to it (and you probably will!)
3- The initrd image created by mkinitrd didn’t work for me, and I’m not sure why not at this point. Falco’s guide says to run these commands:
mv /boot/initrd-`uname -r`.img /boot/initrd-`uname -r`.img_orig mkinitrd /boot/initrd-`uname -r`.img `uname -r`
This makes a backup of the existing initrd image, and then rebuilds a new one. I tried quite a few variations of the command, pointing it at the fstab using the software raid array, but to no avail. I had to manually extract, edit, and recreate the initrd image using the steps of #12 on this post.
I don’t have direct access to the console, but the data center relayed the console error that included this:
switchroot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init
From what I can tell, inside the the initrd image, it runs the init script which tries to run the command ‘mount /sysimage’ which was failing. Without /sysimage the initrd image can’t pass control over to the real system. I was able to replace that line with ‘mount -o defaults –ro -t ext3 /dev/md1 /sysroot’, and then manually cpio/gzip the image back into place. From there I was able to boot off of the mirrored drive and continue as normal.
I have another one or two systems to do like this still, so I’m hoping to refine the process a bit and maybe figure out what when wrong with the initrd. It was educational to dig into the initrd image and figure out a bit more about how a modern linux box boots.