Using btree indexes to speed up MySQL MEMORY (HEAP) table deletes

The IPTrack application that I wrote saves a bunch of Netflow data generated by routers into a MySQL database for analysis and summarization. During peak usage times, it inserts about 50k rows per minute into a table. To keep the table at a manageable size, it then summarizes the useful data into a lower-volume table and deletes any Netflow data older than 10 minutes out of the table. This particular table usually has between 300k and 500k rows in it at any given time.

The MEMORY storage engine operates completely from Ram. Of course this has the advantage of being very quick compared to File based access. The downside is that any data in it is lost if the MySQL instance has to be restarted for any reason. Since my application just uses it for temporary data anyway, and it doesn’t matter all that much if I lose up to 10 minutes of data, this seemed perfectly acceptable.

So, I tried converting the table from MyISAM to MEMORY. For the first couple minutes it looked pretty promising. Disk IO and the machine’s load was pretty small and it was cruising along. But, after ten minutes, when it got to the point where it was purging old data from the table, it came to a grinding halt. The delete statement, which took only a few seconds using the MyISAM engine just kept going and going. It ended up taking about 9 MINUTES to complete. That, of course, is pretty unacceptable. The MEMORY storage engine was supposed to be faster than MyISAM, so this made no sense. The table was already pretty well indexed. I was deleting based on the timestamp column:

DELETE FROM flows WHERE timestamp < ?

and the timestamp column had an index. I had no idea what could make it take so long. I ran across a bug report of somebody else having the problem. The reply was that this was just the nature of hash indexes. The bug report was from 2004 though, so I posted my problem to the mysql forums in search of an answer.

I just noticed that somebody replied to my post and said to use btree indexes instead of hash indexes (the default for the MEMORY engine). Sure enough, I recreated my indexes using btree and it works perfectly now.

CREATE INDEX timestamp_btree USING BTREE ON flows (timestamp);

Thanks to KimSeong Loh on the MySQL forums for solving this for me

Perl Interface to the Quantum Random Bit Generator Service

Quantum Random Bit Generator Service

I read about the Quantum Random Bit Generator Service the other day on slashdot. The service is offered for free with a quick registration at http://random.irb.hr/ They provide the source code and Windows and Linux binaries to connect to the service and retrieve some random data.

Earlier that day I was marveling at the availability of a Perl module to interface with just about anything. I thought it seemed like a good opportunity to write one for this new service.   They provided some C source code, so I figured that I should be able to read through it well enough to understand what it was doing.

The interface that they provide is just a raw TCP connection.  You have to send some header information including your username and password as well as the number of bytes of data you are requesting.   It then sends back a bunch of random bits, and then I transform that into whatever type of numbers you want.

It ended up taking me entirely too long to implement, but I had dedicated enough time to it that I felt pretty committed.  I read through the provided C code, and did a bunch of tcpdumps to capture the traffic that their working program sent and made sure that mine matched it bit by bit.  Eventually I got it working.  I’ve packaged into a module that I’m calling Data::Random::QRBGS.  Now, it is simple to get some random data from the service like this:

  use Data::Random::QRBGS;

  $qrbgs = new Data::Random::QRBGS('username', 'password');

  @ints = $qrbgs->get(4);
  print "Got random integers: @intsn";

  @shorts = $qrbgs->get(2, 's');
  print "Got random shorts: @shortsn";

  $bytes = $qrbgs->getraw(1024);

I’ve created a page at http://www.brandonchecketts.com/qrbgs.php that contains a little documentation and a link to download it.

I’d like to see about getting the module made available through CPAN, but it is actually turning out to be quite complicate to do that. I’ve requested an account, and I guess that has to get approved manually. They instructions recommend joining the mailing list and discussing the module for a while before before actually submitting it. I’ll get around to that as I have time I guess.

Making Awstats ignore the ‘www’ on a domain

awstats is my favorite web statistics program. It provides quite a bit of interesting data about my websites, and I usually set it up for any site that I work on. One issue that I’ve had with it though, is that in a shared setup, by default, it uses the HTTP HOST header to check which domain to display statistics for. The config files and data files are all saved with the full hostname of the domain for the stats it contains. So it treats “www.brandonchecketts.com” different from “brandonchecketts.com”.

This has always been just a slight annoyance to me, and I have just remembered to make sure to put the ‘www’ on the URL when I’m looking at my statistics. Today, though, I thought I would dig into it a little bit to fix it permanently.

Turns out it was pretty easy to change this behavior so that it always just removes the ‘www’ from the domain name for any of the files it looks for. Simply add this on line 1160 of awstats.pl:

$SiteConfig =~ s/^www\\.//g;

It should go right after the $FileConfig= line, and right before the foreach loop. Since all of my database files were created with the ‘www’ in them, I had to go through and rename all of those database files to remove the www from them. Now I can hit either URL and get the same data.

I started a forum thread on the SourceForge forums to announce it to others and see if anybody else finds it useful.

Calculating Amanda Backup Space Usage per Host / Disk

I’ve recently setting up a bunch of hosts with a new Amanda backup server. I like to see, though, how much space each server that I’m backing up uses. Amanda stores a bunch of info in the ‘curinfo’ directory for each host and disk that is being backed up, but I haven’t found any good tools for querying or displaying that. So, I wrote my own. This script looks through all of the files in the ‘curinfo’ directory, and prints out a summary of how much space each disk/host is taking up:

#!/usr/bin/perl

## View Amanda disk space usage per host/disk
## Author:  Brandon Checketts
## Website: http://avazio.com/

$curinfo_dir = "/etc/amanda/avazio/curinfo";

opendir(DH, $curinfo_dir);
while($host = readdir(DH)) {
  next if($host =~ m/^./);
  if( -d "$curinfo_dir/$host") {
    opendir(DH2, "$curinfo_dir/$host");

    while($disk = readdir(DH2)) {
      next if($disk =~ m/^./);
      if( -f "$curinfo_dir/$host/$disk/info") {
        open(FH, "< $curinfo_dir/$host/$disk/info");
        while(my $line = ) {
          if($line =~ m/^history: ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/) {
            ## Example line: history: 1 2319760 2319760 1184766354 1121
            ## Line format:  history [lvl] [rawsize] [compsize] [timestamp] [unk?]
            $space->{$host}->{$disk}->{'rawsize'}  += ($2 / 1024);
            $space->{$host}->{$disk}->{'compsize'} += ($3 / 1024);

          }
        }
      }
    }
    closedir(DH2)
  }
}
closedir(DH);

$grandtotal_rawsize  = 0;
$grandtotal_compsize = 0;
foreach my $host (keys(%{$space})) {
  print "n$hostn";
  $thishost = $space->{$host};
  $thishost_rawsize  = 0;
  $thishost_compsize = 0;
  foreach my $disk (keys(%{$thishost})) {
    $thisdisk = $space->{$host}->{$disk};
    $thishost_rawsize    += $thisdisk->{'rawsize'};
    $thishost_compsize   += $thisdisk->{'compsize'};
    $grandtotal_rawsize  += $thisdisk->{'rawsize'};
    $grandtotal_compsize += $thisdisk->{'compsize'};
    $disk =~ s/_///g;
    printf("  %-40s %-6i Mb   %-6i Mb\\n", $disk, $thisdisk->{'rawsize'}, $thisdisk->{'compsize'});
  }
  printf("  TOTAL:                                   %-6i Mb  %-6i Mb\\n",  $thishost_rawsize, $thishost_compsize);
}

printf("GRAND TOTAL:                                 %-6i MB  %-6i MB\\n", $grandtotal_rawsize, $grandtotal_compsize);

Sending Attributes with Accept packets using FreeRadius and MySQL backend

The company I work at is having to change who their Dialup modem pool goes through. The new company blocks outgoing SMTP by default to prevent spamming. To enable port 25 for our mail servers, they we have to send them some attributes in the Radius ACCEPT packet during authentication. I haven’t really gotten into the Radius server that we use, because it has always worked and we’ve never had to change anything on it until now. So I began digging into FreeRadius, and found out that it is pretty useful.

Our server was configured to authenticate against a MySQL database that contains all of the configuration. I believe it is a pretty standard configuration. In the radius configuration, it defines certain queries to run for authentication, and also for attributes to send in the accept packet. Specifically, it’s was configured to do this query for the attributes:

authorize_reply_query = "
  SELECT id,UserName,Attribute,Value,op 
  FROM ${authreply_table} 
  WHERE Username = '%{SQL-User-Name}'
  ORDER BY id"

So, I can just put additional rows into the ${authreply_table} to add the attributes that this new provider requires. With the default setup, though, I would have to add the attributes for each user. That would get to be a mess, because I would have to modify the billing system which populates the radius database. Instead I modified the SQL query to include rows where the Username = ALLUSERS, like this:

authorize_reply_query = "
    SELECT id,UserName,Attribute,Value,op
    FROM ${authreply_table}
    WHERE (
        Username = '%{SQL-User-Name}' OR
        Username = 'ALLUSERS'
    ) 
   ORDER BY id"

Now, I can just insert rows for those attributes once, and no further modification is necessary. Pretty handy.

Compare used book purchase prices quickly

I was reading the blog of a friend of a friend and came across a discussion about selling used books online. It sounded like there are a bunch of different sites that buy used books. Each of them allows you to put in an ISBN number to see what they are willing to buy it for.

To find the best price, you would have to browse all of these sites to see who was offering the most money. Sometimes the a book may sell for a dollar at one site, but nine dollars at another, so it is worth your time to check out all of the sites.

Sounds like a good candidate for automation to me. I am already doing a pretty similar, but more complicated, version of this with GamePriceWatcher.com. It didn’t take me much time to write some scripts to scrape prices from about eight of these sites. I also included the Perl WWW::Scraper::ISBN module to retrieve some of the details about the book and it has turned out pretty well.

I have it working now at http://avazio.com/sellbooks.php, and may move it over to its own domain if it seems like anybody is using it.

Building an RPM isn’t that hard

We’re installing the Amanda backup client on a bunch of servers that I administer. We’ve decided to tunnel Amanda connections through SSH though, and that feature isn’t available in the current RPM builds that are available.

We started out compiling it from scratch on each box, and then following a bunch of steps to add the appropriate users, create directories, setup SSH keys, etc. Obviously, that gets to be a tedious process pretty quick and is prone to errors and missing steps. The obvious solution is to create an RPM for this, since that is what they are for.

Creating an RPM is pretty easy. I found some useful instructions here and here. Basically, the buildrpm package goes through the standard configure; make; make install steps for you. You can create simple shell scripts to do whatever you want at each of those processes when building the RPM. Additionaly, you can also run commands before and after RPM installation, and before and after RPM removal. For this particular script, I added a bunch of stuff to the postinstall script. As an example, here is my .spec file:

Summary: Amanda Client with SSH
Name: amanda_client_ssh
Version: 2.5.2p1
Release: 2
License: GPL
Group: Amanda Backup
Source: amanda_client_ssh-2.5.2p1.tgz

BuildRoot: %{_builddir}/%{name}-root

%description
Amanda Client compiled with SSH authentication

%prep

%setup
./configure '--a-bunch' '--of options' --'can go here'

%build
make

%install
rm -rf $RPM_BUILD_ROOT
make DESTDIR=$RPM_BUILD_ROOT install
if [[ ! -d $RPM_BUILD_ROOT/etc/amanda ]]; then
    mkdir $RPM_BUILD_ROOT/etc/amanda
fi
cat >> $RPM_BUILD_ROOT/etc/amanda/amanda-client.conf < EOF
conf "MyConfig"
index_server "backup.mydomain.com"
tape_server "backup.mydomain.com"
auth "ssh"
ssh_keys "/root/.ssh/id_rsa_amrecover"
EOF

%pre

%post
useradd -M -n -g disk -o -r -d /var/lib/amanda -s /bin/bash
        -c "Amanda user" -u 33 amanda >/dev/null 2>&1 || :

if [[ ! -d /var/lib/amanda/.ssh ]]; then
    mkdir -p /var/lib/amanda/.ssh
fi
if [[ ! -d /var/log/amanda ]]; then
    mkdir -p /var/log/amanda
fi

touch /etc/amandates

cat >> /etc/amanda/exclude <
/var/spool/mail
/tmp
EOF
mkdir ~amanda/gnutar-lists
chown amanda:disk /etc/amandates /etc/amanda/exclude ~amanda/gnutar-lists /var/log/amanda/

if [[ ! -f /root/.ssh/id_rsa_amrecover ]]; then
    ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa_amrecover
fi

cat >> /var/lib/amanda/.ssh/authorized_keys <EOF
ssh-rsa abcdefgABigLongPublicKeyGoesHere== amanda@backup.webpipe.net
EOF

%preun
userdel amanda

%clean
rm -rf $RPM_BUILD_ROOT

%files
... a bunch of files listed here ...

%changelog

It took a couple revisions of trial and error to get everything correct. That part took the longest because I didn’t see an option to skip the configure and make steps, so anytime I changed the postinstall commands it had to rebuild the whole app. But now that I’ve got the RPM, installing it should be pretty straightforward on the rest of the boxes it needs to be installed on.

Courier Imap “NO Unable to open this mailbox” error

I had some problems with a mail server today and had to reboot it. After booting it back up all the services started, but any attempts to access a mailbox via IMAP generated an error that said “Unable to open this mailbox”. Testing it through telnet looked like this:

[root@ny ~]# telnet localhost 143
>> Trying 127.0.0.1
>> Connected to mail.somedomain.com (127.0.0.1).
>> Escape character is '^]'.
>> * OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT
>> THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION STARTTLS] Courier-IMAP ready. Copyright
>> 1998-2004 Double Precision, Inc.  See COPYING for distribution information.
<< 01 LOGIN user@domain.com mypassword
>> 01 OK LOGIN Ok.
<< 02 LIST "" *
>> * LIST (HasNoChildren) "." "INBOX.Junk"
>> * LIST (HasNoChildren) "." "INBOX.Drafts"
>> * LIST (HasNoChildren) "." "INBOX.Trash"
>> * LIST (HasNoChildren) "." "INBOX.Sent"
>> * LIST (Marked HasChildren) "." "INBOX"
>> 02 OK LIST completed
<< 03 SELECT INBOX
>> 03 NO Unable to open this mailbox.

After digging into file permissions, restarting courier-imap, and anything else I could think of, I was getting puzzled. Then a fellow sysadmin noticed that time time on the box was way off:

mail:~# date
Wed Jan  7 04:09:17 MST 1925

I corrected that and it imap suddenly started working again. I wish that courier-imap would have given a little more descriptive error message, but glad that it was nothing difficult to fix.

Disabling dmraid (fakeraid) on CentOS 5

I recently installed CentOS 5 on a server with a Promise PDC20621 SATA Raid card in it (according to lspci). This particular card, of course, is a FAKE raid device, meaning that the physical card is nothing more than a regular SATA controller, and they provide drivers that emulate RAID functionality. This is supposed to be useful for Windows users that don’t have a native software raid service available, but it is kindof useless for Linux since most distros provide md for creating a software raid device.

When trying to create a new software raid array, I would get a bunch of errors about the devices being busy, like this:

[root@www ~]#  mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: Cannot open /dev/sda1: Device or resource busy
mdadm: Cannot open /dev/sdb1: Device or resource busy
mdadm: Cannot open /dev/sdc1: Device or resource busy
mdadm: Cannot open /dev/sdd1: Device or resource busy

lsof didn’t show any processes that were using these files, and it took a little while to finally find out that ‘dmraid’ was the culprit. dmraid is the linux driver for fake raid controllers like the Promise FastTrak and nVidia on-board SATA controllers. From what I could tell, it is loaded from initrd and automatically attaches itself to any partitions that are of type ‘fd’ (Linux raid autodetect).

After a few hours of googling for answers, I had become pretty familiar with the topic. Many of the search results were from people trying to get mdraid working for these devices before it was stable and widely included in distros.

Unfortunately, it looks like the default CentOS 5 install has the dmraid drivers built into the initrd, and there was no way to disable it from taking control of the drives. I tried looking for an argument to pass to the kernel to disable dmraid support, but couldn’t find anything. A few of the posts and emails that I came across on the subject suggested removing the ‘dmraid’ package, and a few people appeared to have some success with that. But when I tried a ‘yum erase dmraid’ on my box, it wanted to remove the kernel, which would probably be bad.

After a little more searching, I found that mkinitrd had an option to rebuild the initrd without dmraid support. The was an upgrade available for my kernel, so I did a ‘yum update’ to install a new one, which also gave me one to fall-back to if this didn’t work. Once the new kernel was running, I installed the ‘kernel-devel’ and ‘kernel-headers’ packages to pull down some necessary headers, then ran this command to create a new initrd without the troublesome dmraid drivers:

mkinitrd --omit-dmraid /boot/NO_DMRAID_initrd-2.6.18-8.1.6.el5.img 2.6.18-8.1.6.el5

Then, simply change /etc/grub.conf to create an option that pointed to my new initrd. My /etc/grub.conf looks like this:

default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu

## My new non-dmraid boot option
title CentOS (2.6.18-8.1.6.el5) WITHOUT DMRAID GARBAGE
  root (hd0,0)
  kernel /vmlinuz-2.6.18-8.1.6.el5 ro root=/dev/hda1
  initrd /NO_DMRAID_initrd-2.6.18-8.1.6.el5.img
## The regular option
title CentOS (2.6.18-8.1.6.el5)
  root (hd0,0)
  kernel /vmlinuz-2.6.18-8.1.6.el5 ro root=/dev/hda1
  initrd /initrd-2.6.18-8.1.6.el5.img
## My working backup option:
title CentOS (2.6.18-8.el5)
  root (hd0,0)
  kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/hda1
  initrd /initrd-2.6.18-8.el5.img

Now, I just rebooted off the first option, and it didn’t load all of the dmraid junk. I can now access the partitions without the ‘resource busy’ problem, and create a software raid array like I’m used to.