Web Programming, Linux System Administation, and Entrepreneurship in Athens Georgia

Author: Brandon (Page 21 of 29)

WordPress bug: spawn_cron() doesn’t properly consider the port of the cron.php file

I ran into a problem today where a user would submit a new post in WordPress, and it would cause the web server to lock up. Restarting the web server would start Apache properly, and would serve static content fine until the user requested another page from WordPress where it would lock up again.

The configuration is a little odd, so it probably doesn’t happen to many users. In order for it to occur, you have to have the “WordPress Address” setting as a URL starting with ‘https’, and then write your post using a non-https URL. I tracked this down to a problem with the cron function built into wordpress. Specifically this bit of code in the spawn_cron() function in includes/cron.php

$cron_url = get_option( 'siteurl' ) . '/wp-cron.php';
$parts = parse_url( $cron_url );

if ($parts['scheme'] == 'https') {
        // support for SSL was added in 4.3.0
        if (version_compare(phpversion(), '4.3.0', '>=') && function_exists('openssl_open')) {
                $argyle = @fsockopen('ssl://' . $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01);
        } else {
                return false;
        }
} else {
        $argyle = @ fsockopen( $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01 );
}
if ( $argyle )
        fputs( $argyle,
                  "GET {$parts['path']}?check=" . wp_hash('187425') . " HTTP/1.0\\r\\n\\r\\n"
                . "Host: {$_SERVER['HTTP_HOST']}rnrn"
        );

The line that says:

$argyle = @fsockopen('ssl://' . $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01);

Assumes that you are hitting the current page on the same server/port as that returned by get_option( ‘siteurl’ ). Since the user was hitting the non-https version of the site, that would cause this code in the spawn_cron() function to connect to port 80 and try to establish an SSL connection. WordPress would get that request as “\x80|\x01\x03\x01”, and issue it the home page, which would, in-turn, re-run the cron function again. That sub-request would redo the same thing over, and that would continue until Apache ran out of connections. At that point it would try to request the page again, and would wait endlessly for a connection to open up, and never would.

So, to solve, I added one line, and modified another like this:

[root@server wp-includes]# diff cron.php cron.php.original
90,91c90
90,91c90
< $port = isset($parts['port']) ? $parts['port'] : 443;
<                       $argyle = @fsockopen('ssl://' . $parts['host'], $port, $errno, $errstr, 0.01);
---
>                       $argyle = @fsockopen('ssl://' . $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01);
96,97c95
< $port = isset($parts['port']) ? $parts['port'] : 80;
<               $argyle = @ fsockopen( $parts['host'], $port, $errno, $errstr, 0.01 );
---
>               $argyle = @ fsockopen( $parts['host'], $_SERVER['SERVER_PORT'], $errno, $errstr, 0.01 );

That makes it consider the port of the url returned by get_option( ‘siteurl’ ), instead of using the port you are currently connected on. It defaults to port 443 if the url begins with https, and port 80 if not.

I posted the fix to the wordpress forums at https://wordpress.org/support/topic/130492Â Hopefully this gets included in future releases of WordPress

Testing servers through encrypted connections

When testing out Web or Mail servers, I often find myself telneting to the server and issuing raw commands directly. Doing this is incredibly useful for tracking down the source of many problems. Until now, I have never know how to do the same thing over encrypted channels like HTTPS or POP3S. However, I just discovered that the openSSL library has a simple tool that works great. Run the command:

openssl s_client -connect hostname:port

That will perform all of the SSL handshake and display the output for you, and then give you a regular prompt, just like telnet would. For SMTP over TLS it is a little more complicated because you generally would connect to the remote server and then issue the STARTTLS command to negotiate encryption. In that case, you could use the command:

openssl s_client -starttls smtp -crlf -connect host:port

That will tell the openssl client to connect, and send ‘STARTTLS’ before attempting to negotiate the encryption. After that, you’ll end up with a 220 response at which to proceed with your normal SMTP session
Modern versions of openSSL also allow STARTTLS with pop3:

openssl s_client -starttls pop3  -connect host:port

Implementing greylisting on Qmail

With my previous success with greylisting, I have decided that it definitely works well and is worth the little bit of effort it takes to get is installed.    Configuring postifx was very simple, and I (unfortunately) run several mail server that run Qmail.   After a few minutes of googling, I decided on qgreylist, which was the simplest implementation by far.

Several of the alternatives required patching and recompiling qmail, which I definitely didn’t want to do.  qgreylist is just a simple Perl script that runs “in between” the tcpserver and the qmail-smtpd process.   You download it, change the path to it’s working directory, and tweak a couple other variables.  Then copy it into a permanent location, and configure qmail’s smtpd process to send messages to it.   It took a little longer than postgrey, but not too bad.

Find the best book buyback prices with BookScouter.com

A few weeks ago I posted about a quick service I put together that compared textbook buyback prices from a few of the top websites.  I’ve been working on expanding that the past few weeks, and am now unveiling a site dedicated to it.

BookScouter.com is the most comprehensive comparison site for quickly searching for textbook sale prices. It currently scrapes prices from 21 other sites – which is all of them that I could find. The website is written in PHP using a custom framework that I’ve developed and use exclusively now. I found an excellent website called opensourcetemplates.org that has website templates available for free. Their ‘Nautilius’ theme is the one I chose for this site.

The backend of the site is written in Perl. It uses a pretty straightforward LWP to fetch the page, and some regular expressions to pull the price from the pages it obtains. Each site was custom coded, but I got it down to a pretty re-usable script where I just customize a few of the things, like the input variable name for the ISBN and the regex that contains the price. A few of the sites were more complicated than the others and required fetching a couple pages to obtain a session ID.

I’m pretty happy with the end result. Please try to look up a few books and see if you have anything of value sitting around. No registration or any personal information is ever required and it is completely free to use.

Converting Qmail / vpopmail forwards to database format

I’m not a fan of Qmail, and am in the process of migrating users off of it. As one of the steps in a long process, I’m first migrating users from one qmail server to another. The destination server uses a database-backed vpopmail installation to store some of the user information in, while the source server is still using the traditional file-based structure. Each email alias each had a file named .qmail-USERNAME which contains one forward per line. So a forward for brandon would be named .qmail-brandon and contain something like this:

&[email protected]
&[email protected]

There exists a utility named ‘vconvert‘ which converts the actual user accounts into the new, database format. But after a little searching, I was unable to find a similar utility to convert aliases. So I wrote up a quick one in perl.  I tried pasting it here, but WordPress mangles the syntax.  Instead you can view it separately or download it

Effective greylisting using postgrey

I installed postgrey on several of the mail servers that I manage and have been impressed with the results. Greylisting works by temporarily blocking senders the first time that they attempt to send a message. Spammers will (hopefully) give up and move on to more susceptible targets, while legitimate mail servers will retry delivery a few minutes later.

Prior to installing this, our mail filters were identifying about 70-75% of the messages passing through it as spam (this is after rejecting invalid recipients, and using a couple IP-based blacklists). After installing postgrey, that number is down to around 50%. So, for us, a simple 10 minute installation of postgrey has reduced the amount of mail that we have to scan by about 35%.

I actually installed it on a few different machines, all around the same time and wrote up some instructions

Of course, I’ve spent a little more time tweaking the installation just a little. I changed the timeout to 4 minutes, instead of 5, so that if a legitimate mail server is set to retry every 5 minutes, it shouldn’t have a problem. I also customized the URL that it sends in the 450 response to one that points to our own website. Overall I’m very impressed and will recommend installing it on any mail system I’m involved with.

Update 2007-08-08

Here is a graph showing the drop in mail processed due to the greylisting. The drop has been significant and has helped our mail filtering service very much. Fewer spam messages bypass the filters, and the load on the servers has decreased so that we can handle more capacity as we need to.

Greylisting Results

Using btree indexes to speed up MySQL MEMORY (HEAP) table deletes

The IPTrack application that I wrote saves a bunch of Netflow data generated by routers into a MySQL database for analysis and summarization. During peak usage times, it inserts about 50k rows per minute into a table. To keep the table at a manageable size, it then summarizes the useful data into a lower-volume table and deletes any Netflow data older than 10 minutes out of the table. This particular table usually has between 300k and 500k rows in it at any given time.

The MEMORY storage engine operates completely from Ram. Of course this has the advantage of being very quick compared to File based access. The downside is that any data in it is lost if the MySQL instance has to be restarted for any reason. Since my application just uses it for temporary data anyway, and it doesn’t matter all that much if I lose up to 10 minutes of data, this seemed perfectly acceptable.

So, I tried converting the table from MyISAM to MEMORY. For the first couple minutes it looked pretty promising. Disk IO and the machine’s load was pretty small and it was cruising along. But, after ten minutes, when it got to the point where it was purging old data from the table, it came to a grinding halt. The delete statement, which took only a few seconds using the MyISAM engine just kept going and going. It ended up taking about 9 MINUTES to complete. That, of course, is pretty unacceptable. The MEMORY storage engine was supposed to be faster than MyISAM, so this made no sense. The table was already pretty well indexed. I was deleting based on the timestamp column:

DELETE FROM flows WHERE timestamp < ?

and the timestamp column had an index. I had no idea what could make it take so long. I ran across a bug report of somebody else having the problem. The reply was that this was just the nature of hash indexes. The bug report was from 2004 though, so I posted my problem to the mysql forums in search of an answer.

I just noticed that somebody replied to my post and said to use btree indexes instead of hash indexes (the default for the MEMORY engine). Sure enough, I recreated my indexes using btree and it works perfectly now.

CREATE INDEX timestamp_btree USING BTREE ON flows (timestamp);

Thanks to KimSeong Loh on the MySQL forums for solving this for me

Perl Interface to the Quantum Random Bit Generator Service

Quantum Random Bit Generator Service

I read about the Quantum Random Bit Generator Service the other day on slashdot. The service is offered for free with a quick registration at https://random.irb.hr/ They provide the source code and Windows and Linux binaries to connect to the service and retrieve some random data.

Earlier that day I was marveling at the availability of a Perl module to interface with just about anything. I thought it seemed like a good opportunity to write one for this new service.   They provided some C source code, so I figured that I should be able to read through it well enough to understand what it was doing.

The interface that they provide is just a raw TCP connection.  You have to send some header information including your username and password as well as the number of bytes of data you are requesting.   It then sends back a bunch of random bits, and then I transform that into whatever type of numbers you want.

It ended up taking me entirely too long to implement, but I had dedicated enough time to it that I felt pretty committed.  I read through the provided C code, and did a bunch of tcpdumps to capture the traffic that their working program sent and made sure that mine matched it bit by bit.  Eventually I got it working.  I’ve packaged into a module that I’m calling Data::Random::QRBGS.  Now, it is simple to get some random data from the service like this:

  use Data::Random::QRBGS;

  $qrbgs = new Data::Random::QRBGS('username', 'password');

  @ints = $qrbgs->get(4);
  print "Got random integers: @intsn";

  @shorts = $qrbgs->get(2, 's');
  print "Got random shorts: @shortsn";

  $bytes = $qrbgs->getraw(1024);

I’ve created a page at https://www.brandonchecketts.com/qrbgs.php that contains a little documentation and a link to download it.

I’d like to see about getting the module made available through CPAN, but it is actually turning out to be quite complicate to do that. I’ve requested an account, and I guess that has to get approved manually. They instructions recommend joining the mailing list and discussing the module for a while before before actually submitting it. I’ll get around to that as I have time I guess.

Making Awstats ignore the ‘www’ on a domain

awstats is my favorite web statistics program. It provides quite a bit of interesting data about my websites, and I usually set it up for any site that I work on. One issue that I’ve had with it though, is that in a shared setup, by default, it uses the HTTP HOST header to check which domain to display statistics for. The config files and data files are all saved with the full hostname of the domain for the stats it contains. So it treats “www.brandonchecketts.com” different from “brandonchecketts.com”.

This has always been just a slight annoyance to me, and I have just remembered to make sure to put the ‘www’ on the URL when I’m looking at my statistics. Today, though, I thought I would dig into it a little bit to fix it permanently.

Turns out it was pretty easy to change this behavior so that it always just removes the ‘www’ from the domain name for any of the files it looks for. Simply add this on line 1160 of awstats.pl:

$SiteConfig =~ s/^www\\.//g;

It should go right after the $FileConfig= line, and right before the foreach loop. Since all of my database files were created with the ‘www’ in them, I had to go through and rename all of those database files to remove the www from them. Now I can hit either URL and get the same data.

I started a forum thread on the SourceForge forums to announce it to others and see if anybody else finds it useful.

Calculating Amanda Backup Space Usage per Host / Disk

I’ve recently setting up a bunch of hosts with a new Amanda backup server. I like to see, though, how much space each server that I’m backing up uses. Amanda stores a bunch of info in the ‘curinfo’ directory for each host and disk that is being backed up, but I haven’t found any good tools for querying or displaying that. So, I wrote my own. This script looks through all of the files in the ‘curinfo’ directory, and prints out a summary of how much space each disk/host is taking up:

#!/usr/bin/perl

## View Amanda disk space usage per host/disk
## Author:  Brandon Checketts
## Website: https://avazio.com/

$curinfo_dir = "/etc/amanda/avazio/curinfo";

opendir(DH, $curinfo_dir);
while($host = readdir(DH)) {
  next if($host =~ m/^./);
  if( -d "$curinfo_dir/$host") {
    opendir(DH2, "$curinfo_dir/$host");

    while($disk = readdir(DH2)) {
      next if($disk =~ m/^./);
      if( -f "$curinfo_dir/$host/$disk/info") {
        open(FH, "< $curinfo_dir/$host/$disk/info");
        while(my $line = ) {
          if($line =~ m/^history: ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/) {
            ## Example line: history: 1 2319760 2319760 1184766354 1121
            ## Line format:  history [lvl] [rawsize] [compsize] [timestamp] [unk?]
            $space->{$host}->{$disk}->{'rawsize'}  += ($2 / 1024);
            $space->{$host}->{$disk}->{'compsize'} += ($3 / 1024);

          }
        }
      }
    }
    closedir(DH2)
  }
}
closedir(DH);

$grandtotal_rawsize  = 0;
$grandtotal_compsize = 0;
foreach my $host (keys(%{$space})) {
  print "n$hostn";
  $thishost = $space->{$host};
  $thishost_rawsize  = 0;
  $thishost_compsize = 0;
  foreach my $disk (keys(%{$thishost})) {
    $thisdisk = $space->{$host}->{$disk};
    $thishost_rawsize    += $thisdisk->{'rawsize'};
    $thishost_compsize   += $thisdisk->{'compsize'};
    $grandtotal_rawsize  += $thisdisk->{'rawsize'};
    $grandtotal_compsize += $thisdisk->{'compsize'};
    $disk =~ s/_///g;
    printf("  %-40s %-6i Mb   %-6i Mb\\n", $disk, $thisdisk->{'rawsize'}, $thisdisk->{'compsize'});
  }
  printf("  TOTAL:                                   %-6i Mb  %-6i Mb\\n",  $thishost_rawsize, $thishost_compsize);
}

printf("GRAND TOTAL:                                 %-6i MB  %-6i MB\\n", $grandtotal_rawsize, $grandtotal_compsize);
« Older posts Newer posts »

© 2025 Brandon Checketts

Theme by Anders NorenUp ↑