DKIM / SPF / SpamAssassin test moved to dkimvalidator.com

For over 7 years, I’ve hosted an email validation tool on this site. I developed the tool back when I was doing a lot with email, and when DKIM and SPF was still pretty tricky to get working. Over the years it has become the single most popular page on the site, and Google likes it pretty well for certain DKIM and SPF keywords. (And strangely “brandon checketts dkim” pops up on Google search suggest you you try to google my name.

In any case, I’ve moved that functionality over to its own site now at http://dkimvalidator.com/ so that it has its own place to call home. It also got a (albeit weak) visual makeover, and all of the underlying libraries have been updated to the latest versions so that they are once again accurate and up-to-date.

Troubleshooting /etc/cron.d/ on Ubuntu

On Debian-based systems, files in /etc/cron.d:
– must be owned by root
– must not be group or world writable
– may be symlinked, but the destination must follow the ownership and file permissions above
– Filenames must contain ONLY letters, numbers, underscores and hyphens. (not periods)
– must contain a username in the 6th column

From the man page:

Files in this directory have to be owned by root, do not need to be executable (they are configuration files, just like /etc/crontab) and must conform to the same naming convention as used by run-parts(8): they must consist solely of upper- and lower-case letters, digits, underscores, and hyphens. This means that they cannot contain any dots.

The man page also provides this explanation to this strange rule:

For example, any file containing dots will be ignored. This is done to prevent cron from running any of the files that are left by the Debian package management
system when handling files in /etc/cron.d/ as configuration files (i.e. files ending in .dpkg-dist, .dpkg-orig, and .dpkg-new).

Increasing the number of simultaneous SASL authentication servers with Postfix

I had a customer complaining lately that messages sent via Gmail to one of my mail servers was occasionally receiving SMTP Authentication failures and bounce backs from Gmail. Fortunately he noticed it happening mainly when he sent a messages to multiple recipients and was able to send me some of the bounces for me to track it down pretty specifically in the postfix logs.

The Error message via Gmail was:

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 535 535 5.7.0 Error: authentication failed: authentication failure (SMTP AUTH failed with the remote server) (state 7).

This was a little odd, because the SMTP AUTH failure is what I would typically expect with a mistyped username and password. However, I could see that plenty of messages were being sent from the same client. By looking at the specific timestamp of the bounced message, I tracked down the relevant log segment shown below. It indicates 5 concurrent SMTPD sessions where the SASL authentication was successful on 4 of them and failed on the 5th.

Jul  5 12:43:39 mail postfix/smtpd[13602]: connect from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:39 mail postfix/smtpd[13602]: setting up TLS connection from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:39 mail postfix/smtpd[14113]: connect from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:39 mail postfix/smtpd[14113]: setting up TLS connection from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:39 mail postfix/smtpd[14115]: connect from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:39 mail postfix/smtpd[14115]: setting up TLS connection from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:39 mail postfix/smtpd[14116]: connect from mail-bk0-f49.google.com[209.85.214.49]
Jul  5 12:43:39 mail postfix/smtpd[14117]: connect from mail-bk0-f49.google.com[209.85.214.49]
Jul  5 12:43:39 mail postfix/smtpd[14116]: setting up TLS connection from mail-bk0-f49.google.com[209.85.214.49]
Jul  5 12:43:39 mail postfix/smtpd[14117]: setting up TLS connection from mail-bk0-f49.google.com[209.85.214.49]
Jul  5 12:43:39 mail postfix/smtpd[13602]: TLS connection established from mail-bk0-f50.google.com[209.85.214.50]: TLSv1 with cipher RC4-SHA (128/128 bits)
Jul  5 12:43:39 mail postfix/smtpd[14113]: TLS connection established from mail-bk0-f50.google.com[209.85.214.50]: TLSv1 with cipher RC4-SHA (128/128 bits)
Jul  5 12:43:39 mail postfix/smtpd[14115]: TLS connection established from mail-bk0-f50.google.com[209.85.214.50]: TLSv1 with cipher RC4-SHA (128/128 bits)
Jul  5 12:43:39 mail postfix/smtpd[14116]: TLS connection established from mail-bk0-f49.google.com[209.85.214.49]: TLSv1 with cipher RC4-SHA (128/128 bits)
Jul  5 12:43:39 mail postfix/smtpd[14117]: TLS connection established from mail-bk0-f49.google.com[209.85.214.49]: TLSv1 with cipher RC4-SHA (128/128 bits)
Jul  5 12:43:40 mail postfix/smtpd[13602]: 2846B11AC5E2: client=mail-bk0-f50.google.com[209.85.214.50], sasl_method=PLAIN, sasl_username=someuser@somedomain.com
Jul  5 12:43:40 mail postfix/smtpd[14113]: 3290811AC5E3: client=mail-bk0-f50.google.com[209.85.214.50], sasl_method=PLAIN, sasl_username=someuser@somedomain.com
Jul  5 12:43:40 mail postfix/smtpd[14115]: 3C4AD11AC5E4: client=mail-bk0-f50.google.com[209.85.214.50], sasl_method=PLAIN, sasl_username=someuser@somedomain.com
Jul  5 12:43:40 mail postfix/cleanup[13420]: 2846B11AC5E2: message-id=
Jul  5 12:43:40 mail postfix/cleanup[14092]: 3290811AC5E3: message-id=
Jul  5 12:43:40 mail postfix/smtpd[14116]: warning: SASL authentication failure: Password verification failed
Jul  5 12:43:40 mail postfix/smtpd[14116]: warning: mail-bk0-f49.google.com[209.85.214.49]: SASL PLAIN authentication failed: authentication failure
Jul  5 12:43:40 mail postfix/cleanup[14121]: 3C4AD11AC5E4: message-id=
Jul  5 12:43:40 mail postfix/qmgr[32242]: 2846B11AC5E2: from=, size=10564, nrcpt=1 (queue active)
Jul  5 12:43:40 mail postfix/qmgr[32242]: 3290811AC5E3: from=, size=10566, nrcpt=1 (queue active)
Jul  5 12:43:40 mail postfix/smtpd[14116]: disconnect from mail-bk0-f49.google.com[209.85.214.49]
Jul  5 12:43:40 mail postfix/qmgr[32242]: 3C4AD11AC5E4: from=, size=10568, nrcpt=1 (queue active)
Jul  5 12:43:40 mail postfix/smtpd[13602]: disconnect from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:40 mail postfix/smtpd[14113]: disconnect from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:40 mail postfix/smtpd[14115]: disconnect from mail-bk0-f50.google.com[209.85.214.50]
Jul  5 12:43:40 mail postfix/smtpd[14117]: D4F2411AC5E5: client=mail-bk0-f49.google.com[209.85.214.49], sasl_method=PLAIN, sasl_username=someuser@somedomain.com
Jul  5 12:43:41 mail postfix/cleanup[13420]: D4F2411AC5E5: message-id=
Jul  5 12:43:41 mail postfix/qmgr[32242]: D4F2411AC5E5: from=, size=10565, nrcpt=1 (queue active)
Jul  5 12:43:41 mail postfix/smtpd[14117]: disconnect from mail-bk0-f49.google.com[209.85.214.49]

In looking into the SASL component a bit, I noticed that there were 5 simultaneous SASL servers running. The first one looks like a parent with 4 child processes.

[root@mail postfix]# ps -ef |grep sasl
root      9253     1  0 Mar15 ?        00:00:04 /usr/sbin/saslauthd -m /var/run/saslauthd -a rimap -r -O 127.0.0.1
root      9262  9253  0 Mar15 ?        00:00:04 /usr/sbin/saslauthd -m /var/run/saslauthd -a rimap -r -O 127.0.0.1
root      9263  9253  0 Mar15 ?        00:00:04 /usr/sbin/saslauthd -m /var/run/saslauthd -a rimap -r -O 127.0.0.1
root      9264  9253  0 Mar15 ?        00:00:04 /usr/sbin/saslauthd -m /var/run/saslauthd -a rimap -r -O 127.0.0.1
root      9265  9253  0 Mar15 ?        00:00:04 /usr/sbin/saslauthd -m /var/run/saslauthd -a rimap -r -O 127.0.0.1

So it seemed likely that the 4 child processes were in use and that Postfix couldn’t open a connection to a 5th simultaneous SASL authentication server, so it responded with a generic SMTP AUTH failure.

To fix, I simply added a couple of extra arguments to the saslauthd command that is run. I added a ‘-c’ parameter to enable caching, and ‘-n 10’ to increase the number of servers to 10. On my CentOS server, I accomplished that by modifying /etc/sysconfig/saslauthd to look like this:

# Directory in which to place saslauthd's listening socket, pid file, and so
# on.  This directory must already exist.
SOCKETDIR=/var/run/saslauthd

# Mechanism to use when checking passwords.  Run "saslauthd -v" to get a list
# of which mechanism your installation was compiled with the ablity to use.
MECH=rimap

# Additional flags to pass to saslauthd on the command line.  See saslauthd(8)
# for the list of accepted flags.
FLAGS="-r -O 127.0.0.1 -c -n 10"

After restarting saslauthd, and a quick test, it looks good so far.

Upgrading Disks in a Software Raid System

I have a system at home with a pair of 640 GB drives, that are getting full. The drives are configured in a Linux Software RAID1 array. Replacing those drives with a pair of 2 TB drives is a pretty easy (although lengthy process).

First, we need to verify that our RAID system is running correctly. Take a look at /proc/mdstat to verify that both drives are online:

root@host:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[1] sda3[0]
      622339200 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
      192640 blocks [2/2] [UU]

After verifying that the system is running, I powered the machine off and removed the second hard drive and replaced it with the new drive. Upon starting it back up, Ubuntu noticed that the RAID system was running in degraded mode and I had to hit yes at the console to have it continue booting.

Once the machine was running, I logged in and created a similar partition structure on the new drive using the fdisk command. On my system, I have a small 200 partition for /boot as /dev/sdb1, a 1 GB swap partition, and then the rest of the drive is one big block for the root partition. The I copied the partition table that was on /dev/sda, but for the final partition, made it take up the entire rest of the drive. Make sure to set the first partition as bootable. The partitions on /dev/sdb now look like this:

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          24      192748+  fd  Linux raid autodetect
/dev/sdb2              25         146      979965   82  Linux swap / Solaris
/dev/sdb3             147      243201  1952339287+  fd  Linux raid autodetect

After the partitions match up, I can now add the partitions on /dev/sdb into the RAID array

root@host:~# mdadm --manage /dev/md0 --add /dev/sdb1
root@host:~# mdadm --manage /dev/md1 --add /dev/sdb3

And watch the status of the rebuild using

watch cat /proc/mdstat

After that was done, I installed grub onto the new /dev/sdb

grub /dev/sdb
root (hd1)
setup (hd1,0)

Finally, reboot once and make sure that everything works as expected.

The system is now running with one old drive and one new drive. The next step is to perform the same steps with removing the other old drive and rebuilding and re-adding it to the raid system. The steps are the same as above, except performing them with /dev/sda. I also had to change my BIOS to boot from the second drive.

Once both drives are installed and working with the RAID array, the final part of the process is to increase the size of the file system to the full size of our new drives. I first had to disable the EXT3 file system journal. This was necessary so that the online resize doesn’t run out of memory.

Edit /etc/fstab and change the file system type to ext2, and the arguments to include “noload” which will disable the file system journal. My /etc/fstab looks like this:

# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# /dev/md1
UUID=b165f4be-a705-4479-b830-b0c6ee77b730 /               ext2    noload,relatime,errors=remount-ro 0       1
# /dev/md0
UUID=30430fe0-919d-4ea6-b3c2-5e3564344917 /boot           ext3    relatime        0       2
# /dev/sda5
UUID=94b03944-d215-4882-b0e0-dab3a8c50480 none            swap    sw              0       0
# /dev/sdb5
UUID=ebb381ae-e1bb-4918-94ec-c26e388bb539 none            swap    sw              0       0

You then have to run `mkinitramfs` to rebuild the initramfs file to include the desired change to the mount options

root@host:~# mkinitramfs -o /boot/initrd.img-2.6.31-20-generic 2.6.31-20-generic

Reboot to have the system running with journaling disabled on the root partition. Finally, you can actually increase the RAID device to the full size of the device:

root@host:~# mdadm --grow /dev/md3 --size=max

And then resize the file system:

root@host:~# resize2fs /dev/md1
resize2fs 1.41.9 (22-Aug-2009)
Filesystem at /dev/md1 is mounted on /; on-line resizing required
old desc_blocks = 38, new_desc_blocks = 117
Performing an on-line resize of /dev/md1 to 488084800 (4k) blocks.

The system is now up and running with larger drives :). Change the /etc/fstab, rebuild the initramfs, and reboot one more time to re-enable journaling and be running officially.

Troubleshooting OpenVZ Out of Memory and Could Not Allocate Memory Errors

I will sometimes run into an error inside an OpenVZ guest where the system complains about running out of memory. This is usually seen as a message containing “Cannot allocate memory”. It may even occur when trying to enter into the container

[root@host ~]# vzctl enter 908
entered into CT 908
-bash: /root/.bash_profile: Cannot allocate memory

This particular error can be difficult to troubleshoot because OpenVZ has so many different limits. The “Cannot allocate Memory” error is a pretty generic error, so Googling it doesn’t necessarily yield any useful results.

I’ve found that the best way to track down the source of the problem is to examine the /proc/user_beancounter statistics on the host server. The far right column is labeled ‘failcnt’ and is a simple counter for the number of times that particular resource has been exhausted. Any non-zero numbers in that column may indicate a problem. In this example, you can see that the dcachesize parameter is too small and should be increased

[root@host ~]# cat /proc/user_beancounters
Version: 2.5
       uid  resource                     held              maxheld              barrier                limit              failcnt
      908:  kmemsize                  5170890              5776435             43118100             44370492                    0
            lockedpages                     0                    0                  256                  256                    0
            privvmpages                 53561                55089              1048576              1048576                    0
            shmpages                        2                    2                21504                21504                    0
            dummy                           0                    0                    0                    0                    0
            numproc                        30                   38                 2000                 2000                    0
            physpages                    6136                 7392                    0  9223372036854775807                    0
            vmguarpages                     0                    0                65536                65536                    0
            oomguarpages                 6136                 7392                26112  9223372036854775807                    0
            numtcpsock                      8                    8                  360                  360                    0
            numflock                        5                    6                  380                  420                    0
            numpty                          0                    1                   16                   16                    0
            numsiginfo                      0                    2                  256                  256                    0
            tcpsndbuf                  140032                    0             10321920             16220160                    0
            tcprcvbuf                  131072                    0              1720320              2703360                    0
            othersockbuf               151320               156864              4504320             16777216                    0
            dgramrcvbuf                     0                 8472               262144               262144                    0
            numothersock                  107                  109                 5000                 5000                    0
            dcachesize                3408570              3413265              3409920              3624960                   81
            numfile                       724                  957                18624                18624                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            numiptent                      10                   10                  128                  128                    0

You can do a little investigation to see what each particular limit does. I will usually not dig too deep, but just double the value and try again. You can set a new value using the vzctl command like this:

[root@host ~] vzctl set 908 --dcachesize=6819840:7249920 --save

Southeast Linux Fest Presentation on MySQL Replication

I was fortunate to be selected to give a presentation at the 2010 Southeast Linux Fest held this year in Greenville, SC. The topic was MySQL replication which I picked from a similar presentation I gave about about 1.5 years ago at my local LUG. I’ve configured plenty of replicated servers and I think that I understand it well enough to explain it to others.

The 2-hour presentation is about half slides and half demo. Throughout the course of the presentation I set up a simple master-slave. Then I add a second slave. Taking it a step farther I set up the three servers to replicate in a chain, and finally I configure them to replicate in a full circle so that changes made on one are propagated to all of the others. I intentionally do things that break replication at certain points to show some of the limitations and configurable features that can help it to work.

Slides for the presentation are available OpenOffice format.

The presentation was recorded, so hopefully the SELF team will have those videos available shortly.

Skipping the DROP TABLE, CREATE TABLE statements in a large mysqldump file.

I have a large table of test data that I’m copying into some development environments. I exported the table with a mysqldump which has a DROP TABLE and CREATE TABLE statements at the top

DROP TABLE IF EXISTS `mytable`;
CREATE TABLE `mytable` (
  `somecol` varchar(10) NOT NULL default '',
   ... other columns ...
  PRIMARY KEY  (`somecol`),
  KEY `isbn10` (`somecol`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The problem is that the developer has altered the table and re-importing the test data would undo those changes. Editing the text file is impractical because of its size (500 MB gzipped). So I came up with this workaround which just slightly alters the SQL using sed so that it doesn’t try to drop or recreate the table. It comments out the DROP TABLE line, and creates the new table in the test database instead of the real database.

zcat bigfile.sql.gz |sed "s/DROP/-- DROP/"|sed "s/CREATE TABLE /CREATE TABLE test./"|mysql databasename

Sleeping for a random amount of time in a shell script

You can use the special $RANDOM environment variable to get a random number and divide it by the maximum number of seconds that you want to wait. Use the remainder as the number of seconds to sleep since it will always be between zero and the max you specified. This example will sleep anywhere between zero and 10 minutes (600 seconds)

 /bin/sleep/sleep   `/usr/bin/expr $RANDOM % 600`

Purists will note that it isn’t truly random. The maximum value for $RANDOM is 32767 which is not evenly divisible by most likely values, but it is close enough.

Installing SVN and Trac on a CentOS 5 server

Make sure that you have the RPMForge repository enabled. Install Subversion, mod_dav_svn, and trac. This will install a few required dependencies (ie: neon and some python utils)

# yum install subversion mod_dav_svn mod_python trac

Create a directory for your repositories, and an initial repository for testing, and create your htpasswd file. Then create a trac environment and set it up.

# mkdir /home/svn/
# svnadmin create testrepo
# chown -R apache:apache /home/svn/*
# htpasswd -c  /home/svn/.htpasswd brandon

#mkdir /home/trac/
# trac-admin /home/trac/ initenv
    ... answer questions as appropriate ...
# chown apache:apache /home/trac/*
# htpasswd -c  /home/svn/.htpasswd brandon

Add this to your Apache configuration in the relevant place (I like to put it under an SSL VirtualHost)

    <Location /svn>
        DAV svn
        SVNParentPath /home/svn/
        #SVNListParentPath on
        # Authentication
        AuthType Basic
        AuthName "RoundSphere SVN Repository"
        AuthUserFile /home/svn/.htpasswd
        Order deny,allow
        Require valid-user
    </Location>
    <Location /trac>
        SetHandler mod_python
        PythonHandler trac.web.modpython_frontend
        PythonOption TracEnv /home/trac
        PythonOption TracUriRoot /trac
        # Authentication
        AuthType Basic
        AuthName “MyCompany Trac Environment"
        AuthUserFile /home/svn/.htpasswd
        Require valid-user
    </Location>

Now test to make sure that you can view your test repository in a browser and that it prompts for a username and password as desired:

https://your-hostname/svn/testrepo/

You should retrieve a plain looking page that mentions the name of your repository and that it is at Revision 0

You should also be able to access your trac installation at

https://your-hostname/trac/

Customize your logo, change the home page, start making some tickets, using the wiki and get to work.

Installing the Pandora One client on 64-bit Ubuntu 9.10

I was surprised and happy to see that the Pandora One client should work on Linux. It uses the Adobe Air framework which means that Pandora doesn’t have to write a specific Linux variant.

However, installing it on a modern 64-Bit Ubuntu 9.10 install took just a bit of manipulation to get it to work. Pandora provides some basic instructions for Linux users here, even though Linux is officially unsupported. Those instructions, along with the Adobe AIR notes here provided enough information for me to get it installed and working.

Here’s what I did:

  • Start out at the Pandora One site
  • Click on the "Download Pandora Desktop" link and save that file to /tmp
  • Follow the link to Install Adobe Air and save that file to /tmp also
  • Open a shell, and chmod the Adobe Air installer to 755 and then run it.
  • Go through the Adobe AIR install until it completes
  • Once Adobe AIR is installed, you will need to put some 32-bit libraries in place to make it run correctly. Some of the steps on Adobe’s site work, and some don’t, so this is what I did
  • Download the two .deb files for libnss3 and Libnspr4 to /tmp
  • From your shell, run:
     sudo file-roller ./libnss3-1d_3.12.0~beta3-0ubuntu1_i386.deb
    
  • Navigate to data.tar.gz => /usr => lib. Click on all of the files in that directory and click Extract. Type in /usr/lib32/ so that they extract there, then close all of the file-roller windows.
  • Do the same thing with the libnspr4 .deb file that you downloaded
  • Copy the adobe cert store into place with this command:
     sudo cp /usr/lib/libadobecertstore.so /usr/lib32
    
  • Now you can finally install the Pandora application by running:
    sudo Adobe\ AIR\ Application\ Installer /tmp/pandora_2_0_2.air 
    

    That should install the application correctly. It will add an icon to Applications / Accessories.

  • Upon starting up the Pandora One client, it currently complains about connecting to an untrusted server for me. I have to click to accept for this session each time

Now you should be able to play your Pandora music from your 64-bit Ubuntu 9.10 box.