Block comment spam with bcSpamBlock

A while ago I installed Paul Butler’s JSSpamBlock on my WordPress blog here. His original idea is simple and brilliant: Spambots don’t (yet) execute Javascript. In fact, they usually post directly to the form without even displaying the form first. By having a hidden input field that is populated by javascript, you can verify that users are hitting the page without the user even noticing. For users with JavaScript disabled (are there any of you out there), they simply have to copy/paste a small string into a textbox for verification.

Since implementing a slightly modified version of it on this blog, I have gotten zero spam posts. Now, I wanted some way to implement the same logic on some of my own custom PHP sites to prevent spam on them as well.

While working on a way to re-implement Paul’s WordPress plugin in my own sites, I came up with something pretty clever. Instead of saving a row to a database every time that the form is displayed, you can use a little cryptography to make the client pass all of the data needed to validate the request back to you on its own. The idea is sortof merger between the JSSpamBlock plugin and TCP Syncookies, which use a similar method of having the client store the data for you.

Essentially, how it works, is that the function generates a Random ID. It then encrypts the current timestamp and the random ID using PHP’s crypt() function with some cryptographic salt that is unique to each server. All three of those values (the random ID, the timestamp, and the encrypted value) are then passed to the browser. The timestamp and the encrypted value are stored in hidden <input> fields, while the random ID displayed for the user to verification. If the user has JavaScript enabled, a few lines of JavaScript copy the random ID into another textbox, and then hide that prompt, so that it is never seen by the user. If the user doesn’t have JavaScript enabled, the would have to copy/paste that random ID into the textbox themselves, similar to a captcha.

When the form is submitted, it checks to make sure that the timestamp is not too old, and then re-encrypts the passed in timestamp and random ID using the same salt value to make sure it matches the crypted value passed in from the form. If everything matches, the comment is approved, otherwise an error is displayed to the user.

I wrote this up into a simple include file that can be used for any PHP application. I also implemented a quick WordPress plugin that uses the generic version. More information about it can be found on my new bcSpamBlock home page

Fix for CentOS “Can’t do setuid (cannot exec sperl)”

If you are running a Perl script with the setuid bit, it actually runs a slightly modified version of Perl so that it is a bit more cautious. On a CentOS box, you need to install the ‘perl-suidperl’ package to get the necessary files installed. Otherwise you get an error like this:

[root@host bin]# ls -al myscript.pl
-rws--S--- 1 mail mail 1218 Oct  1 13:09 myscript.pl

[root@host bin]# ./myscript.pl
Can't do setuid (cannot exec sperl)

Find the best book buyback prices with BookScouter.com

A few weeks ago I posted about a quick service I put together that compared textbook buyback prices from a few of the top websites.  I’ve been working on expanding that the past few weeks, and am now unveiling a site dedicated to it.

BookScouter.com is the most comprehensive comparison site for quickly searching for textbook sale prices.   It currently scrapes prices from 21 other sites – which is all of them that I could find.  The website is written in PHP using a custom framework that I’ve developed and use exclusively now.   I found an excellent website called opensourcetemplates.org that has website templates available for free.  Their ‘Nautilius’ theme is the one I chose for this site.

The backend of the site is written in Perl.  It uses a pretty straightforward LWP to fetch the page, and some regular expressions to pull the price from the pages it obtains.  Each site was custom coded, but I got it down to a pretty re-usable script where I just customize a few of the things, like the input variable name for the ISBN and the regex that contains the price.    A few of the sites were more complicated than the others and required fetching a couple pages to obtain a session ID.

I’m pretty happy with the end result.   Please try to look up a few books and see if you have anything of value sitting around.   No registration or any personal information is ever required and it is completely free to use.

Perl Interface to the Quantum Random Bit Generator Service

Quantum Random Bit Generator Service

I read about the Quantum Random Bit Generator Service the other day on slashdot. The service is offered for free with a quick registration at http://random.irb.hr/ They provide the source code and Windows and Linux binaries to connect to the service and retrieve some random data.

Earlier that day I was marveling at the availability of a Perl module to interface with just about anything. I thought it seemed like a good opportunity to write one for this new service.   They provided some C source code, so I figured that I should be able to read through it well enough to understand what it was doing.

The interface that they provide is just a raw TCP connection.  You have to send some header information including your username and password as well as the number of bytes of data you are requesting.   It then sends back a bunch of random bits, and then I transform that into whatever type of numbers you want.

It ended up taking me entirely too long to implement, but I had dedicated enough time to it that I felt pretty committed.  I read through the provided C code, and did a bunch of tcpdumps to capture the traffic that their working program sent and made sure that mine matched it bit by bit.  Eventually I got it working.  I’ve packaged into a module that I’m calling Data::Random::QRBGS.  Now, it is simple to get some random data from the service like this:

  use Data::Random::QRBGS;

  $qrbgs = new Data::Random::QRBGS('username', 'password');

  @ints = $qrbgs->get(4);
  print "Got random integers: @intsn";

  @shorts = $qrbgs->get(2, 's');
  print "Got random shorts: @shortsn";

  $bytes = $qrbgs->getraw(1024);

I’ve created a page at http://www.brandonchecketts.com/qrbgs.php that contains a little documentation and a link to download it.

I’d like to see about getting the module made available through CPAN, but it is actually turning out to be quite complicate to do that. I’ve requested an account, and I guess that has to get approved manually. They instructions recommend joining the mailing list and discussing the module for a while before before actually submitting it. I’ll get around to that as I have time I guess.

Compare used book purchase prices quickly

I was reading the blog of a friend of a friend and came across a discussion about selling used books online. It sounded like there are a bunch of different sites that buy used books. Each of them allows you to put in an ISBN number to see what they are willing to buy it for.

To find the best price, you would have to browse all of these sites to see who was offering the most money. Sometimes the a book may sell for a dollar at one site, but nine dollars at another, so it is worth your time to check out all of the sites.

Sounds like a good candidate for automation to me. I am already doing a pretty similar, but more complicated, version of this with GamePriceWatcher.com. It didn’t take me much time to write some scripts to scrape prices from about eight of these sites. I also included the Perl WWW::Scraper::ISBN module to retrieve some of the details about the book and it has turned out pretty well.

I have it working now at http://avazio.com/sellbooks.php, and may move it over to its own domain if it seems like anybody is using it.

PHP 4’s call_user_func passes everything by value

I spent quite a while today debugging a problem where call_user_func was not passing a parameter by reference. I was trying to pass an object into a function whose name is not known until run time.

Passing it by reference means that changes made to $var inside foo() are made to the actual variable instead of to a copy of the value (when passed by value).  However, for some reason, when calling a function with call_user_func(), it passes everything by value, regardless of how the function is defined.

function foo(&$var)
{
  $var++;
}

$bar = 1;
foo($bar);
echo $bar;    // outputs '2'

$function = 'foo';

call_user_func($function, $bar);
echo $bar;  // you'd expect this to output 3 now, but it still outputs 2

$function($bar);
echo $bar;  // outputs 3 now

As the sample code shows, the solution is to avoid the use of the call_user_func() function by using a variable function name. Thanks to Steve Hannah’s blog post at http://www.sjhannah.com/blog/?p=86 for helping me to solve this one.

Credit Card Validation using the mod10 algorithm in PHP

I’m working on a site that will use the Paypal API for submitting merchant account transactions to them. I’d like to validate as much credit card information as possible before passing any information to a 3rd party, since are different kind of credit cards companies and options, so I’ve been reading to find out more about it. I came across the mod10 check that credit cards use and wrote a little PHP function to validate a card number

function sumdigits($number)
{
  $sum = 0;
  for($i = 0; $i <= strlen($number) - 1; $i++) {
    $sum += substr($number, $i, 1);
  }
  return $sum;
}

function mod10check($number)
{
  $sum_number = '';
  for($i = strlen($number) - 1; $i >= 0; $i--) {
    $thisdigit = substr($number, $i, 1);
    $sum_number .= ( $loop %2 == 0) ? $thisdigit : sumdigits($thisdigit * 2);
  }
  return sumdigits($sum_number) % 10 == 0 ? true : false;
}

Tracking down how hackers gain access through web apps

Hackers commonly use vulnerabilities in web applications to gain access to a server. Sometimes, though, it can be difficult to track down exactly how they gained access to a server. Especially if the server hosts a bunch of websites and there are lots of potentially vulnerable scripts.

I’ve tracked down more of these than I can count, and have sortof developed a pattern for investigating. Here are some useful things to try:

1- Look in /tmp and /var/tmp for possibly malicious files. These directories are usually world-writable, and commonly used to temporarily store files. Sometimes the files are disguised with leading dot’s, or they may be named something that looks similar to other files in the directory like “. ” (dot- space), or like a session files named sess_something.

If you are able to see any files, you can use the timestamps of the files to try and look through some Apache logs to find the exact hit that it came from

2- If a rogue process is still running, look at the /proc entry for that file to determine more information about it. The files in /proc/<PID> will tell you information like the executable file that created the process, it’s working directory, environment information, and plenty more details. Usually, the rogue processes are running as the apache user (httpd, nobody, apache).

If all of the rogue processes were being run by the Apace user, then the hacker likely didn’t gain root access. If you have rogue processes that were being run by root, it is much harder to clean up after. Usually the only truly safe method is to start over with a clean installation.

3- netstat -l will help you identify processes that are listening for incoming connections. Often times, these are a perl script. Sometimes they are named things that look legitmiate like ‘httpd’, so pay close attention. netstat-n will help you to see current connections that your server has to others.

4- Look in your error logs for files being downloaded with wget. A common tactic is for hackers to run a wget command to download another file with more malicious instructions. Fortunately, wget writes to STDERR, so it’s output is usually displayed in the error logs. Something like this is evidence of a successful wget:

--20:30:40--  http://somehackedsite.com/badfile.txt
            => `Lnx.txt'
Resolving somehackedsite.com... 12.34.56.78

Connecting to somehackedsite.com[12.34.56.78]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12,345 [text/plain]

     0K .......... ......                                     100%  263.54 KB/s

20:30:50 (263.54 KB/s) - `badfile.txt' saved [12,345/12,345]

You can use this information to try and recreate what the hacker did. Look for the file they downloaded (badfile.txt in this case) and look at what it does. You can also used these timestamps to look through access_logs to find the vulnerable script.

Since wget is a commonly used tool for this, I like to create a .wgetrc file that contains bogus proxy information, so that even if a hacker is able to attempt a download, it won’t work. Create a .wgetrc file in Apache’s home directory with this content:

http_proxy = http://bogus.dontresolveme.com:19999/
ftp_proxy = http://bogus.dontresolveme.com:19999/

5- If you were able to identify any timestamps, you can grep through Apache logs to find requests from that time. If you have a well-structured server where you have logs in a consistent place, then you can use a command like this to search all of the log files at onces:

grep "01\\/Jun\\/2007:10:20:" /home/*/logs/access_log

I usually leave out the seconds field because requests sometimes take several seconds to execute. If you have a server name or file name that you found was used by a wget, you can try searching for those too:

grep "somehackesite.com" /home/*/logs/access_log

6 – Turn of PHP’s register_globals by default and only enable it if truly needed. If you write PHP apps, learn how to program securely, and never rely on register_globals being on.

What a difference a blank line can make

I had a customer today who had problems with a PHP script that output a Microsoft Word document. The script was pretty simple and just did some authentication before sending the file to the client. But, when the document was opened in Word, it tried to convert it into a different format and would only display gibberish.

The customer had posted his problem on some forums, and was told that upgrading from PHP 5.1.4 to PHP 5.2 should fix the problem. Well it didn’t. In fact, the PHP 5.2 version had some weird bug where a PDO object would overwrite stuff in the wrong memory location. In this case, a call to fetchAll() was overwritting the username stored in the $_SESSION variable, which in turn was messing up all of the site’s authentication. After digging into it to find that out, it seemed best to revert back to PHP 5.1. Once that was completed, the we were back to the original problem with the Word document.

The headers he was sending all looked okay. Here’s the relevant code to download a document:

$file = "/path/to/some_file.doc";
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private",false); // required for certain browsers
header("Content-Type: application/msword");
header("Content-Disposition: attachment; filename=\"".basename($file)."\";" );
header("Content-Transfer-Encoding: binary");
header("Content-Length: ".filesize($file));
readfile($file);

I tried tweaking with them a little to match a known-working site, but to no avail. I finally had to download a copy of the file directly from the web server, bypassing the PHP script. I also downloaded a copy of the file through the PHP script and saved them both for comparison. After looking at them both side-by-side in vi, I noticed an extra line at the top of the bad one. I removed the extra line and downloaded the fixed copy which opened fine in Word. After that, it was just a matter of finding the included file with an extra line in it. Sure enough, one of the configuration files had an extra line after the closing ?> tag. Removed that and everything worked correctly.

Avazio.com it is

After spending far to many hours looking up possible domain names, I’ve finally settled on avazio.com. This will be a place for me to sell programs that I’ve written, and to advertise System Administration and Programming services. There is no special meaning or anything to the name. It’s just something that sounded cool and was available. I’ve spent a little time putting up a website there with a little bit of information about the products and services that I’m hoping to sell.

I’m actually quite happy with the look of the site. It’s nothing too complicated, but I have created all of the graphics for it myself using an old version of Paint Shop Pro. Considering that I know nothing about graphics, I think that it looks pretty good. I picked the colors from colorschemer.com (although I forget which one).