Upgrading Disks in a Software Raid System

I have a system at home with a pair of 640 GB drives, that are getting full. The drives are configured in a Linux Software RAID1 array. Replacing those drives with a pair of 2 TB drives is a pretty easy (although lengthy process).

First, we need to verify that our RAID system is running correctly. Take a look at /proc/mdstat to verify that both drives are online:

root@host:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[1] sda3[0]
      622339200 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
      192640 blocks [2/2] [UU]

After verifying that the system is running, I powered the machine off and removed the second hard drive and replaced it with the new drive. Upon starting it back up, Ubuntu noticed that the RAID system was running in degraded mode and I had to hit yes at the console to have it continue booting.

Once the machine was running, I logged in and created a similar partition structure on the new drive using the fdisk command. On my system, I have a small 200 partition for /boot as /dev/sdb1, a 1 GB swap partition, and then the rest of the drive is one big block for the root partition. The I copied the partition table that was on /dev/sda, but for the final partition, made it take up the entire rest of the drive. Make sure to set the first partition as bootable. The partitions on /dev/sdb now look like this:

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          24      192748+  fd  Linux raid autodetect
/dev/sdb2              25         146      979965   82  Linux swap / Solaris
/dev/sdb3             147      243201  1952339287+  fd  Linux raid autodetect

After the partitions match up, I can now add the partitions on /dev/sdb into the RAID array

root@host:~# mdadm --manage /dev/md0 --add /dev/sdb1
root@host:~# mdadm --manage /dev/md1 --add /dev/sdb3

And watch the status of the rebuild using

watch cat /proc/mdstat

After that was done, I installed grub onto the new /dev/sdb

grub /dev/sdb
root (hd1)
setup (hd1,0)

Finally, reboot once and make sure that everything works as expected.

The system is now running with one old drive and one new drive. The next step is to perform the same steps with removing the other old drive and rebuilding and re-adding it to the raid system. The steps are the same as above, except performing them with /dev/sda. I also had to change my BIOS to boot from the second drive.

Once both drives are installed and working with the RAID array, the final part of the process is to increase the size of the file system to the full size of our new drives. I first had to disable the EXT3 file system journal. This was necessary so that the online resize doesn’t run out of memory.

Edit /etc/fstab and change the file system type to ext2, and the arguments to include “noload” which will disable the file system journal. My /etc/fstab looks like this:

# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# /dev/md1
UUID=b165f4be-a705-4479-b830-b0c6ee77b730 /               ext2    noload,relatime,errors=remount-ro 0       1
# /dev/md0
UUID=30430fe0-919d-4ea6-b3c2-5e3564344917 /boot           ext3    relatime        0       2
# /dev/sda5
UUID=94b03944-d215-4882-b0e0-dab3a8c50480 none            swap    sw              0       0
# /dev/sdb5
UUID=ebb381ae-e1bb-4918-94ec-c26e388bb539 none            swap    sw              0       0

You then have to run `mkinitramfs` to rebuild the initramfs file to include the desired change to the mount options

root@host:~# mkinitramfs -o /boot/initrd.img-2.6.31-20-generic 2.6.31-20-generic

Reboot to have the system running with journaling disabled on the root partition. Finally, you can actually increase the RAID device to the full size of the device:

root@host:~# mdadm --grow /dev/md3 --size=max

And then resize the file system:

root@host:~# resize2fs /dev/md1
resize2fs 1.41.9 (22-Aug-2009)
Filesystem at /dev/md1 is mounted on /; on-line resizing required
old desc_blocks = 38, new_desc_blocks = 117
Performing an on-line resize of /dev/md1 to 488084800 (4k) blocks.

The system is now up and running with larger drives :). Change the /etc/fstab, rebuild the initramfs, and reboot one more time to re-enable journaling and be running officially.

Troubleshooting OpenVZ Out of Memory and Could Not Allocate Memory Errors

I will sometimes run into an error inside an OpenVZ guest where the system complains about running out of memory. This is usually seen as a message containing “Cannot allocate memory”. It may even occur when trying to enter into the container

[root@host ~]# vzctl enter 908
entered into CT 908
-bash: /root/.bash_profile: Cannot allocate memory

This particular error can be difficult to troubleshoot because OpenVZ has so many different limits. The “Cannot allocate Memory” error is a pretty generic error, so Googling it doesn’t necessarily yield any useful results.

I’ve found that the best way to track down the source of the problem is to examine the /proc/user_beancounter statistics on the host server. The far right column is labeled ‘failcnt’ and is a simple counter for the number of times that particular resource has been exhausted. Any non-zero numbers in that column may indicate a problem. In this example, you can see that the dcachesize parameter is too small and should be increased

[root@host ~]# cat /proc/user_beancounters
Version: 2.5
       uid  resource                     held              maxheld              barrier                limit              failcnt
      908:  kmemsize                  5170890              5776435             43118100             44370492                    0
            lockedpages                     0                    0                  256                  256                    0
            privvmpages                 53561                55089              1048576              1048576                    0
            shmpages                        2                    2                21504                21504                    0
            dummy                           0                    0                    0                    0                    0
            numproc                        30                   38                 2000                 2000                    0
            physpages                    6136                 7392                    0  9223372036854775807                    0
            vmguarpages                     0                    0                65536                65536                    0
            oomguarpages                 6136                 7392                26112  9223372036854775807                    0
            numtcpsock                      8                    8                  360                  360                    0
            numflock                        5                    6                  380                  420                    0
            numpty                          0                    1                   16                   16                    0
            numsiginfo                      0                    2                  256                  256                    0
            tcpsndbuf                  140032                    0             10321920             16220160                    0
            tcprcvbuf                  131072                    0              1720320              2703360                    0
            othersockbuf               151320               156864              4504320             16777216                    0
            dgramrcvbuf                     0                 8472               262144               262144                    0
            numothersock                  107                  109                 5000                 5000                    0
            dcachesize                3408570              3413265              3409920              3624960                   81
            numfile                       724                  957                18624                18624                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            numiptent                      10                   10                  128                  128                    0

You can do a little investigation to see what each particular limit does. I will usually not dig too deep, but just double the value and try again. You can set a new value using the vzctl command like this:

[root@host ~] vzctl set 908 --dcachesize=6819840:7249920 --save

Southeast Linux Fest Presentation on MySQL Replication

I was fortunate to be selected to give a presentation at the 2010 Southeast Linux Fest held this year in Greenville, SC. The topic was MySQL replication which I picked from a similar presentation I gave about about 1.5 years ago at my local LUG. I’ve configured plenty of replicated servers and I think that I understand it well enough to explain it to others.

The 2-hour presentation is about half slides and half demo. Throughout the course of the presentation I set up a simple master-slave. Then I add a second slave. Taking it a step farther I set up the three servers to replicate in a chain, and finally I configure them to replicate in a full circle so that changes made on one are propagated to all of the others. I intentionally do things that break replication at certain points to show some of the limitations and configurable features that can help it to work.

Slides for the presentation are available OpenOffice format.

The presentation was recorded, so hopefully the SELF team will have those videos available shortly.

Skipping the DROP TABLE, CREATE TABLE statements in a large mysqldump file.

I have a large table of test data that I’m copying into some development environments. I exported the table with a mysqldump which has a DROP TABLE and CREATE TABLE statements at the top

DROP TABLE IF EXISTS `mytable`;
CREATE TABLE `mytable` (
  `somecol` varchar(10) NOT NULL default '',
   ... other columns ...
  PRIMARY KEY  (`somecol`),
  KEY `isbn10` (`somecol`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The problem is that the developer has altered the table and re-importing the test data would undo those changes. Editing the text file is impractical because of its size (500 MB gzipped). So I came up with this workaround which just slightly alters the SQL using sed so that it doesn’t try to drop or recreate the table. It comments out the DROP TABLE line, and creates the new table in the test database instead of the real database.

zcat bigfile.sql.gz |sed "s/DROP/-- DROP/"|sed "s/CREATE TABLE /CREATE TABLE test./"|mysql databasename

Sleeping for a random amount of time in a shell script

You can use the special $RANDOM environment variable to get a random number and divide it by the maximum number of seconds that you want to wait. Use the remainder as the number of seconds to sleep since it will always be between zero and the max you specified. This example will sleep anywhere between zero and 10 minutes (600 seconds)

 /bin/sleep/sleep   `/usr/bin/expr $RANDOM % 600`

Purists will note that it isn’t truly random. The maximum value for $RANDOM is 32767 which is not evenly divisible by most likely values, but it is close enough.

Installing SVN and Trac on a CentOS 5 server

Make sure that you have the RPMForge repository enabled. Install Subversion, mod_dav_svn, and trac. This will install a few required dependencies (ie: neon and some python utils)

# yum install subversion mod_dav_svn mod_python trac

Create a directory for your repositories, and an initial repository for testing, and create your htpasswd file. Then create a trac environment and set it up.

# mkdir /home/svn/
# svnadmin create testrepo
# chown -R apache:apache /home/svn/*
# htpasswd -c  /home/svn/.htpasswd brandon

#mkdir /home/trac/
# trac-admin /home/trac/ initenv
    ... answer questions as appropriate ...
# chown apache:apache /home/trac/*
# htpasswd -c  /home/svn/.htpasswd brandon

Add this to your Apache configuration in the relevant place (I like to put it under an SSL VirtualHost)

    <Location /svn>
        DAV svn
        SVNParentPath /home/svn/
        #SVNListParentPath on
        # Authentication
        AuthType Basic
        AuthName "RoundSphere SVN Repository"
        AuthUserFile /home/svn/.htpasswd
        Order deny,allow
        Require valid-user
    </Location>
    <Location /trac>
        SetHandler mod_python
        PythonHandler trac.web.modpython_frontend
        PythonOption TracEnv /home/trac
        PythonOption TracUriRoot /trac
        # Authentication
        AuthType Basic
        AuthName “MyCompany Trac Environment"
        AuthUserFile /home/svn/.htpasswd
        Require valid-user
    </Location>

Now test to make sure that you can view your test repository in a browser and that it prompts for a username and password as desired:

https://your-hostname/svn/testrepo/

You should retrieve a plain looking page that mentions the name of your repository and that it is at Revision 0

You should also be able to access your trac installation at

https://your-hostname/trac/

Customize your logo, change the home page, start making some tickets, using the wiki and get to work.

Installing the Pandora One client on 64-bit Ubuntu 9.10

I was surprised and happy to see that the Pandora One client should work on Linux. It uses the Adobe Air framework which means that Pandora doesn’t have to write a specific Linux variant.

However, installing it on a modern 64-Bit Ubuntu 9.10 install took just a bit of manipulation to get it to work. Pandora provides some basic instructions for Linux users here, even though Linux is officially unsupported. Those instructions, along with the Adobe AIR notes here provided enough information for me to get it installed and working.

Here’s what I did:

  • Start out at the Pandora One site
  • Click on the "Download Pandora Desktop" link and save that file to /tmp
  • Follow the link to Install Adobe Air and save that file to /tmp also
  • Open a shell, and chmod the Adobe Air installer to 755 and then run it.
  • Go through the Adobe AIR install until it completes
  • Once Adobe AIR is installed, you will need to put some 32-bit libraries in place to make it run correctly. Some of the steps on Adobe’s site work, and some don’t, so this is what I did
  • Download the two .deb files for libnss3 and Libnspr4 to /tmp
  • From your shell, run:
     sudo file-roller ./libnss3-1d_3.12.0~beta3-0ubuntu1_i386.deb
    
  • Navigate to data.tar.gz => /usr => lib. Click on all of the files in that directory and click Extract. Type in /usr/lib32/ so that they extract there, then close all of the file-roller windows.
  • Do the same thing with the libnspr4 .deb file that you downloaded
  • Copy the adobe cert store into place with this command:
     sudo cp /usr/lib/libadobecertstore.so /usr/lib32
    
  • Now you can finally install the Pandora application by running:
    sudo Adobe\ AIR\ Application\ Installer /tmp/pandora_2_0_2.air 
    

    That should install the application correctly. It will add an icon to Applications / Accessories.

  • Upon starting up the Pandora One client, it currently complains about connecting to an untrusted server for me. I have to click to accept for this session each time

Now you should be able to play your Pandora music from your 64-bit Ubuntu 9.10 box.

Speed up a Linux Software Raid Rebuild

I’m setting up software raid on a running server and it is taking forever for the initial sync of the raid drives on the 1TB hard disks. It has been running for about 6 hours and says that it will take about 5 days (7400 minutes) as this pace:

[root@host ~]# cat /proc/mdstat
md1 : active raid1 sdb3[1] sda3[2]
      974559040 blocks [2/1] [_U]
      [>....................]  recovery =  3.9% (38109184/974559040) finish=7399.1min speed=2108K/sec

I did some read and write tests directly to the drives using dd to make sure that they were working okay, and they can operate at about 100 MB/s

[root@host ~]# dd if=/dev/zero of=/dev/sda2 bs=1024 count=1024000
    1048576000 bytes (1.0 GB) copied, 10.8882 seconds, 96.3 MB/s
[root@host ~]# dd if=/dev/zero of=/dev/sdb2 bs=1024 count=1024000
    1048576000 bytes (1.0 GB) copied, 11.1162 seconds, 94.3 MB/s
[root@host ~]# dd if=/dev/sda2 of=/dev/null bs=1024 count=1024000
    1048576000 bytes (1.0 GB) copied, 10.2829 seconds, 102 MB/s
[root@host ~]# dd if=/dev/sdb2 of=/dev/null bs=1024 count=1024000
    1048576000 bytes (1.0 GB) copied, 10.5109 seconds, 99.8 MB/s

What I failed to realize is that there is a configurable limit for the min and max speed of the rebuild. Those parameters are configured in /proc/sys/dev/raid/speed_limit_min and /proc/sys/dev/raid/speed_limit_max. They default to a pretty slow 1MB/s minimum which was causing it to take forever.

Increasing the maximum limit didn’t automatically make it faster either. I had to increase the minimum limit to get it to jump up to a respectable speed.

[root@host ~]# echo 100000 > /proc/sys/dev/raid/speed_limit_min

[root@host ~]# watch cat /proc/mdstat
Every 2.0s: cat /proc/mdstat 
md1 : active raid1 sdb3[1] sda3[2]
      974559040 blocks [2/1] [_U]
      [=>...................]  recovery =  7.7% (75695808/974559040) finish=170.5min speed=87854K/sec

Now it is up around 87 MB/s and will take just a few hours to complete the rest of the drive.

PHP Wrapper Class for a Read-only database

This is a pretty special case of a database wrapper class where I wanted to discard any updates to the database, but want SELECT queries to run against an alternative read-only database. In this instance, I have a planned outage of a primary database server, but would like the public-facing websites and web services to remain as accessible as possible.

I wrote this quick database wrapper class that will pass all SELECT queries on to a local replica of the database, and silently discard any updates. On this site almost all of the functionality still works, but it obviously isn’t saving and new information while the primary database is unavailable.

Here is my class. This is intended as a wrapper to an ADOdb class, but it is generic enough that I think it would work for many other database abstraction functions as well as seamless data pump.

class db_unavailable {
    var $readonly_db;

    function __construct($readonly_db)
    {
        $this->query_db = $readonly_db;
    }

    function query($sql)
    {
        $args = func_get_args();
        if (preg_match("#(INSERT INTO|REPLACE INTO|UPDATE|DELETE)#i", $args[0])) {
            // echo "Unable to do insert/replace/update/delete query: $sql\n";
            return true;
        } else {
            return call_user_func_array(array($this->readonly_db, 'query'), $args);
        }
    }

    function __call($function, $args)
    {
        return call_user_func_array(array($this->readonly_db, $function), $args);
    }
}

I simply create my $query_db object that points to the read-only database. Then create my main $db object as a new db_unavailable() object. Any select queries against $db will behave as they normally do, and data-modifying queries will be silently discarded.

Booting from a Software Raid device on Ubunto Karmic (9.10)

Its a few days away from Karmic’s official release, and I’m putting together a new computer and thought I would give the new version a try. I set everything up with Software raid, and it is still annoying to me that to do so on Ubuntu still requires the Alternate CD and a text-based install.

I configured the /boot partition to use a RAID1 array, as I normally do. After getting most of the way through the install, I got a big red screen with the following error:

====================================
Unable to install GRUB in /dev/md0
Executing 'grub-install /dev/md0 failed'
This is a fatal error
====================================

I tried several variations on continuing, but nothing seemed to work. Turns out that a known issue with Karmic is that this doesn’t work. Actually the known issue says that grub is only installed to the first drive, but that was not my experience.

Bug 420748 gives some detail on the issue, and it seems that the change to Grub2 might have something to do with it. From the red screen, I chose to proceed without installing a boot loader. I Rebooted to Alternate CD, and chose to repair a broken installation. Once to a shell, I ran these commands to install grub to the master boot records on both drives

grub-install /dev/sda
grub-install /dev/sdb

I removed the CD and rebooted. It ended up at a grub shell. At which point I ran the following to boot into the system (note the difference in grub2 commands)

set root=(hd0,1)
linux /vmlinuz-2.6.31-14-generic ro root=/dev/md1 (use tab-completion on the kernel image filename)
initrd /initrd.img-2.6.31-14-generic (again, use tab-completion)
boot

At that point, it started up normally. Once logged in, I started a terminal and then ran this command as root to create a minimal grub.cfg file (note that it is no longer called menu.lst)

grub-mkconfig > /boot/grub/grub.cfg

I’m now able to reboot into a working system. So much for a flawless install experience.