Tracking TCP connections with netstat

I’ve been troubleshooting some possible problems on a mail server recently, and have been digging into TCP connections some. The ‘netstat’ command has a ‘-o’ option that displays some timers that are useful:

[mail]# netstat -on |grep 189.142.18.18
tcp        0      8 205.244.47.142:25           189.142.18.18:1256          ESTABLISHED on (17.00/4/0)
tcp        0    452 205.244.47.142:25           189.142.18.18:2676          ESTABLISHED on (36.09/6/0)

This displays countdown timers for each TCP State. For example, if a connection is in FIN_WAIT and you run the command over and over with “watch”, you can watch the time count down to 0 and then go away. The man pages and documentation I could find didn’t explain the timers very well, so this is what I have learned by watching it. (read: this is not official).

When a connection is in the ESTABLISHED state, the timer can be either on or off. From what I can tell, the counter turns to ON when there is some kind of trouble with the connection. It looks like when a retransmission occurs, the timer is flipped ON, and then the countdown timer starts. The countdown timer has 3 numbers. The first is a countdown in seconds, the second is incremented for each retransmission, and the third one is always 0, so I’m not sure what it does

Now that I have a basic understanding of the output, I still have to figure out why these connections just hang. My guess at this point is that it is poorly written spamming software and maxed out bandwidth on all a whole much of compromised machines throughout the world that are hitting my mail servers.

Postfix regexp tables are memory hogs

I spent a good part of the day today troubleshooting memory problems on some postfix mail servers. Each smtpd process was using over 11 MB of Ram which seems really high. Each concurrent SMTP session has its own smtpd process, and with over 150 concurrent connections, that was using well over 1.5 GB of Ram.

[root@mail ~]# ps aux|grep -i smtpd |head -n1
postfix   3978  0.0  0.5 16096 11208 ?       S    12:29   0:00 smtpd -n smtp -t inet -u

After some trial and error of temporarily disabling stuff in the main.cf file, I narrowed the memory usage to a regexp table in a transport map:

transport_maps = regexp:/etc/postfix/transport.regexp

The transport.regexp file had about 1400 lines in it to match various possible address variations for a stupid mailing list application. Each mailing list has 21 different possible commands (addresses). By combining those 21 different commands into a single regex, I was able to cut those 1400 lines down to about 70. Now the smtpd processes use just under 5mb each:

[root@mail ~]# ps aux|grep -i smtpd |head -n1
postfix   7634  0.0  0.2  9916 4996 ?        S    13:31   0:00 smtpd -n smtp -t inet -u

So, by my math, a savings of about 6,000 kb of memory by removing 1300 lines from the regexp file means that each regexp used about 4.5 kb of memory. Overall, with 150+ simultaneous smtpd processes, that resulted in several hundred megs of memory saved on each mail server.

WordPress bug: spawn_cron() doesn’t properly consider the port of the cron.php file

I ran into a problem today where a user would submit a new post in WordPress, and it would cause the web server to lock up. Restarting the web server would start Apache properly, and would serve static content fine until the user requested another page from WordPress where it would lock up again.

The configuration is a little odd, so it probably doesn’t happen to many users. In order for it to occur, you have to have the “WordPress Address” setting as a URL starting with ‘https’, and then write your post using a non-https URL. I tracked this down to a problem with the cron function built into wordpress. Specifically this bit of code in the spawn_cron() function in includes/cron.php

$cron_url = get_option( 'siteurl' ) . '/wp-cron.php';
$parts = parse_url( $cron_url );

if ($parts['scheme'] == 'https') {
        // support for SSL was added in 4.3.0
        if (version_compare(phpversion(), '4.3.0', '>=') && function_exists('openssl_open')) {
                $argyle = @fsockopen('ssl://' . $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01);
        } else {
                return false;
        }
} else {
        $argyle = @ fsockopen( $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01 );
}
if ( $argyle )
        fputs( $argyle,
                  "GET {$parts['path']}?check=" . wp_hash('187425') . " HTTP/1.0\\r\\n\\r\\n"
                . "Host: {$_SERVER['HTTP_HOST']}rnrn"
        );

The line that says:

$argyle = @fsockopen('ssl://' . $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01);

Assumes that you are hitting the current page on the same server/port as that returned by get_option( ‘siteurl’ ). Since the user was hitting the non-https version of the site, that would cause this code in the spawn_cron() function to connect to port 80 and try to establish an SSL connection. WordPress would get that request as “\x80|\x01\x03\x01”, and issue it the home page, which would, in-turn, re-run the cron function again. That sub-request would redo the same thing over, and that would continue until Apache ran out of connections. At that point it would try to request the page again, and would wait endlessly for a connection to open up, and never would.

So, to solve, I added one line, and modified another like this:

[root@server wp-includes]# diff cron.php cron.php.original
90,91c90
90,91c90
< $port = isset($parts['port']) ? $parts['port'] : 443;
<                       $argyle = @fsockopen('ssl://' . $parts['host'], $port, $errno, $errstr, 0.01);
---
>                       $argyle = @fsockopen('ssl://' . $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01);
96,97c95
< $port = isset($parts['port']) ? $parts['port'] : 80;
<               $argyle = @ fsockopen( $parts['host'], $port, $errno, $errstr, 0.01 );
---
>               $argyle = @ fsockopen( $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01 );

That makes it consider the port of the url returned by get_option( ‘siteurl’ ), instead of using the port you are currently connected on. It defaults to port 443 if the url begins with https, and port 80 if not.

I posted the fix to the wordpress forums at http://wordpress.org/support/topic/130492Â Hopefully this gets included in future releases of WordPress

Testing servers through encrypted connections

When testing out Web or Mail servers, I often find myself telneting to the server and issuing raw commands directly. Doing this is incredibly useful for tracking down the source of many problems. Until now, I have never know how to do the same thing over encrypted channels like HTTPS or POP3S. However, I just discovered that the openSSL library has a simple tool that works great. Run the command:

openssl s_client -connect hostname:port

That will perform all of the SSL handshake and display the output for you, and then give you a regular prompt, just like telnet would. For SMTP over TLS it is a little more complicated because you generally would connect to the remote server and then issue the STARTTLS command to negotiate encryption. In that case, you could use the command:

openssl s_client -starttls smtp -crlf -connect host:port

That will tell the openssl client to connect, and send ‘STARTTLS’ before attempting to negotiate the encryption. After that, you’ll end up with a 220 response at which to proceed with your normal SMTP session
Modern versions of openSSL also allow STARTTLS with pop3:

openssl s_client -starttls pop3  -connect host:port

Implementing greylisting on Qmail

With my previous success with greylisting, I have decided that it definitely works well and is worth the little bit of effort it takes to get is installed.    Configuring postifx was very simple, and I (unfortunately) run several mail server that run Qmail.   After a few minutes of googling, I decided on qgreylist, which was the simplest implementation by far.

Several of the alternatives required patching and recompiling qmail, which I definitely didn’t want to do.  qgreylist is just a simple Perl script that runs “in between” the tcpserver and the qmail-smtpd process.   You download it, change the path to it’s working directory, and tweak a couple other variables.  Then copy it into a permanent location, and configure qmail’s smtpd process to send messages to it.   It took a little longer than postgrey, but not too bad.

Disabling dmraid (fakeraid) on CentOS 5

I recently installed CentOS 5 on a server with a Promise PDC20621 SATA Raid card in it (according to lspci). This particular card, of course, is a FAKE raid device, meaning that the physical card is nothing more than a regular SATA controller, and they provide drivers that emulate RAID functionality. This is supposed to be useful for Windows users that don’t have a native software raid service available, but it is kindof useless for Linux since most distros provide md for creating a software raid device.

When trying to create a new software raid array, I would get a bunch of errors about the devices being busy, like this:

[root@www ~]#  mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: Cannot open /dev/sda1: Device or resource busy
mdadm: Cannot open /dev/sdb1: Device or resource busy
mdadm: Cannot open /dev/sdc1: Device or resource busy
mdadm: Cannot open /dev/sdd1: Device or resource busy

lsof didn’t show any processes that were using these files, and it took a little while to finally find out that ‘dmraid’ was the culprit. dmraid is the linux driver for fake raid controllers like the Promise FastTrak and nVidia on-board SATA controllers. From what I could tell, it is loaded from initrd and automatically attaches itself to any partitions that are of type ‘fd’ (Linux raid autodetect).

After a few hours of googling for answers, I had become pretty familiar with the topic. Many of the search results were from people trying to get mdraid working for these devices before it was stable and widely included in distros.

Unfortunately, it looks like the default CentOS 5 install has the dmraid drivers built into the initrd, and there was no way to disable it from taking control of the drives. I tried looking for an argument to pass to the kernel to disable dmraid support, but couldn’t find anything. A few of the posts and emails that I came across on the subject suggested removing the ‘dmraid’ package, and a few people appeared to have some success with that. But when I tried a ‘yum erase dmraid’ on my box, it wanted to remove the kernel, which would probably be bad.

After a little more searching, I found that mkinitrd had an option to rebuild the initrd without dmraid support. The was an upgrade available for my kernel, so I did a ‘yum update’ to install a new one, which also gave me one to fall-back to if this didn’t work. Once the new kernel was running, I installed the ‘kernel-devel’ and ‘kernel-headers’ packages to pull down some necessary headers, then ran this command to create a new initrd without the troublesome dmraid drivers:

mkinitrd --omit-dmraid /boot/NO_DMRAID_initrd-2.6.18-8.1.6.el5.img 2.6.18-8.1.6.el5

Then, simply change /etc/grub.conf to create an option that pointed to my new initrd. My /etc/grub.conf looks like this:

default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu

## My new non-dmraid boot option
title CentOS (2.6.18-8.1.6.el5) WITHOUT DMRAID GARBAGE
  root (hd0,0)
  kernel /vmlinuz-2.6.18-8.1.6.el5 ro root=/dev/hda1
  initrd /NO_DMRAID_initrd-2.6.18-8.1.6.el5.img
## The regular option
title CentOS (2.6.18-8.1.6.el5)
  root (hd0,0)
  kernel /vmlinuz-2.6.18-8.1.6.el5 ro root=/dev/hda1
  initrd /initrd-2.6.18-8.1.6.el5.img
## My working backup option:
title CentOS (2.6.18-8.el5)
  root (hd0,0)
  kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/hda1
  initrd /initrd-2.6.18-8.el5.img

Now, I just rebooted off the first option, and it didn’t load all of the dmraid junk. I can now access the partitions without the ‘resource busy’ problem, and create a software raid array like I’m used to.

mod_auth_mysql makes managing Apache authentication simple

I administer about 20 different web applications, each of which uses Apache authentication to control access. In the past, I’ve just used simple htpasswd authentication because it works and is readily available. However when adding or removing employee’s access, it required pretty manual editing of all of the htpasswd files every time that we added or removed and employee

I just starting using mod_auth_mysql which provides a way to centralize the authentication. It is available as a package on any distro that I’ve used, and is pretty simple to configure. Just create a database with the following tables:

CREATE TABLE users (
  user_name CHAR(30) NOT NULL,
  user_passwd CHAR(20) NOT NULL,
  PRIMARY KEY (user_name)
);
CREATE TABLE groups (
  user_name CHAR(30) NOT NULL,
  user_group CHAR(20) NOT NULL,
  PRIMARY KEY (user_name, user_group)
);

Populate the users table with username/passwords taken straight from the .htpasswd file. Optionally, you can make users a member of a group via the groups table. Create a database user with permission to SELECT from those two tables.

Then configure the following in the Apache config or .htaccess file for each your web applications:

AuthName "Some Webapp"
AuthType Basic
AuthMySQLEnable on
AuthMySQLHost myauthserver.someplace.com
AuthMySQLUser YourDatabaseName
AuthMySQLPassword YourDatabaseUserPassword
AuthMySQLDB YourDatabaseName
AuthMySQLUserTable users
AuthMySQLNameField user_name
AuthMySQLPasswordField user_passwd
AuthMySQLGroupTable groups
AuthMySQLGroupField user_group

require valid-user
#require group ThisApp

Now you can centrally manage your Apache authentication. Uncomment the ‘require group’ line and add an appropriate entry in the groups table for any users you want to allow specifically to this app.

Tracking down how hackers gain access through web apps

Hackers commonly use vulnerabilities in web applications to gain access to a server. Sometimes, though, it can be difficult to track down exactly how they gained access to a server. Especially if the server hosts a bunch of websites and there are lots of potentially vulnerable scripts.

I’ve tracked down more of these than I can count, and have sortof developed a pattern for investigating. Here are some useful things to try:

1- Look in /tmp and /var/tmp for possibly malicious files. These directories are usually world-writable, and commonly used to temporarily store files. Sometimes the files are disguised with leading dot’s, or they may be named something that looks similar to other files in the directory like “. ” (dot- space), or like a session files named sess_something.

If you are able to see any files, you can use the timestamps of the files to try and look through some Apache logs to find the exact hit that it came from

2- If a rogue process is still running, look at the /proc entry for that file to determine more information about it. The files in /proc/<PID> will tell you information like the executable file that created the process, it’s working directory, environment information, and plenty more details. Usually, the rogue processes are running as the apache user (httpd, nobody, apache).

If all of the rogue processes were being run by the Apace user, then the hacker likely didn’t gain root access. If you have rogue processes that were being run by root, it is much harder to clean up after. Usually the only truly safe method is to start over with a clean installation.

3- netstat -l will help you identify processes that are listening for incoming connections. Often times, these are a perl script. Sometimes they are named things that look legitmiate like ‘httpd’, so pay close attention. netstat-n will help you to see current connections that your server has to others.

4- Look in your error logs for files being downloaded with wget. A common tactic is for hackers to run a wget command to download another file with more malicious instructions. Fortunately, wget writes to STDERR, so it’s output is usually displayed in the error logs. Something like this is evidence of a successful wget:

--20:30:40--  http://somehackedsite.com/badfile.txt
            => `Lnx.txt'
Resolving somehackedsite.com... 12.34.56.78

Connecting to somehackedsite.com[12.34.56.78]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12,345 [text/plain]

     0K .......... ......                                     100%  263.54 KB/s

20:30:50 (263.54 KB/s) - `badfile.txt' saved [12,345/12,345]

You can use this information to try and recreate what the hacker did. Look for the file they downloaded (badfile.txt in this case) and look at what it does. You can also used these timestamps to look through access_logs to find the vulnerable script.

Since wget is a commonly used tool for this, I like to create a .wgetrc file that contains bogus proxy information, so that even if a hacker is able to attempt a download, it won’t work. Create a .wgetrc file in Apache’s home directory with this content:

http_proxy = http://bogus.dontresolveme.com:19999/
ftp_proxy = http://bogus.dontresolveme.com:19999/

5- If you were able to identify any timestamps, you can grep through Apache logs to find requests from that time. If you have a well-structured server where you have logs in a consistent place, then you can use a command like this to search all of the log files at onces:

grep "01\\/Jun\\/2007:10:20:" /home/*/logs/access_log

I usually leave out the seconds field because requests sometimes take several seconds to execute. If you have a server name or file name that you found was used by a wget, you can try searching for those too:

grep "somehackesite.com" /home/*/logs/access_log

6 – Turn of PHP’s register_globals by default and only enable it if truly needed. If you write PHP apps, learn how to program securely, and never rely on register_globals being on.

Disabling PHP processing for an individual file

I sometimes want to post samples of PHP scripts on my website. Since the site web server is configured to parse files that end in .php, that means that simply linking to the PHP file will try to parse it instead of displaying its contents. In the past, I’ve always made a copy of the file with a .txt extension to have it displayed as text/plain. That way is kindof clumsy though. If a user wants to save the file, they download it as a .txt and have to rename it to .php.
Fortunately, Apache has a way to do about anything. To configure it to not parse a specific PHP file, you can use this in your Apache configuration:

<Files "some.file.php">
   RemoveHandler .php
   ForceType text/plain
</Files>

If you have AllowOverride FileInfo enabled, this can also be placed in a .htaccess file. It should work for other file types like .cgi or .pl files as well. You can substitute a FilesMatch directive to have it match multiple file names base on a regular expression match.

What a difference a blank line can make

I had a customer today who had problems with a PHP script that output a Microsoft Word document. The script was pretty simple and just did some authentication before sending the file to the client. But, when the document was opened in Word, it tried to convert it into a different format and would only display gibberish.

The customer had posted his problem on some forums, and was told that upgrading from PHP 5.1.4 to PHP 5.2 should fix the problem. Well it didn’t. In fact, the PHP 5.2 version had some weird bug where a PDO object would overwrite stuff in the wrong memory location. In this case, a call to fetchAll() was overwritting the username stored in the $_SESSION variable, which in turn was messing up all of the site’s authentication. After digging into it to find that out, it seemed best to revert back to PHP 5.1. Once that was completed, the we were back to the original problem with the Word document.

The headers he was sending all looked okay. Here’s the relevant code to download a document:

$file = "/path/to/some_file.doc";
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private",false); // required for certain browsers
header("Content-Type: application/msword");
header("Content-Disposition: attachment; filename=\"".basename($file)."\";" );
header("Content-Transfer-Encoding: binary");
header("Content-Length: ".filesize($file));
readfile($file);

I tried tweaking with them a little to match a known-working site, but to no avail. I finally had to download a copy of the file directly from the web server, bypassing the PHP script. I also downloaded a copy of the file through the PHP script and saved them both for comparison. After looking at them both side-by-side in vi, I noticed an extra line at the top of the bad one. I removed the extra line and downloaded the fixed copy which opened fine in Word. After that, it was just a matter of finding the included file with an extra line in it. Sure enough, one of the configuration files had an extra line after the closing ?> tag. Removed that and everything worked correctly.