Impressed with PowerDNS

I’ve spent the last couple weeks working with PowerDNS. We’re migrating our old BIND servers over to new PowerDNS servers that use a MySQL backend. Installation was fairly easy, because things were well documented. The application has worked perfectly, and when I emailed their mailing list to ask about a configuration setting that wasn’t documented, I got a useful reply within minutes.

Since PowerDNS is just the DNS Server, it doesn’t provide any user-interfaces for modifying the DNS information. I took a look at several of the possible applications that claimed to be “front ends” for PowerDNS, but didn’t find any that suited our needs. (I tried out WebDNS, Tupa, and a couple others listed on SourceForge). The existing tools were too complex, too simple, or too buggy. But, the database schema that PowerDNS uses, is pretty straightforward, so I wrote a PHP class that provides most of the necessary functions, and started our long-awaited customer interface that uses the class to allow our customers to maintain their own DNS records.

Overall, this has been a great project with great results.

Laughed for several minutes from this comic.

It’s not often that a simple comic will make me laugh for several minutes, but I thought this one was great.
Sudo

I found it at http://www.redhat.com/magazine/025nov06/features/gifts/xkcd.jpg

sudo” is Linux program that allows regular users to execute certain commands as root. Often, when working as a non-root user, you will type one command, and it complain that you need to be root to run it. You would then quickly add “sudo” to the beginning of the command and execute it again, this time running as root, and the command will run successfully.

Database encryption made easy

I’ve always wondered how one would securely store sensitive information in a MySQL database. A recent project has given me the opportunity to work on it, and I’ve been impressed on how easy it is to implement. MySQL provides an easy interface for encrypting data before storing it in the database. Simply use the AES_ENCRYPT and AES_DECRYPT functions when reading or writing to a table.

Simply make your column a blob field, then use something like this to write to the table

(using a PEAR::DB syntax)

$db->query("
UPDATE sometable
SET    some_col = AES_ENCRYPT( ?, ?)
WHERE something_else = ?
" array( $sensitive_value, $encryption_key, $index));

and something like this to read it back out

$value = $db->getOne("
SELECT AES_DECRYPT( some_col, ?)
FROM   sometable
WHERE something_else = ?
", array( $encryption_key, $index));

There’s no reason not to use mod_deflate

I’ve been trying to convince one of our larger customers to install mod_deflate on their server for about a month. They have had concerns about compatibility with older browsers and the possibility that it will affect the PageRank, but I have finally put enough pressure on to have them let me try it. They have very few users with old browsers (and really, if somebody is running using that archaic of a browser, how likely is it that they are going to buy something on your site), and convinced them that there should be no SEO consequences with the change (if anything, the search engines will respect your site more for using less bandwidth and having a knowledgeable administrator).

Early this morning, I got it installed (took all of about 15 minutes) and it’s running great. It’s compressing HTML and Javascript files to about 20% of their original size, which equates to some significant bandwidth savings, and quicker page-load times. About 60% of the total bandwidth used on this site is for HTML and JavaScript files (the other 40% is images, movies, and a few other odds and ends). Overall, it looks like about a 30-40% drop in total bandwidth usage, which is very significant. I’ve heard of no problems with browser compatibility either, so everybody is happy.

Overall, I’d say that there is no good reason not to use mod_deflate on your site. Especially if you ever get charged for bandwidth overages.

Here are some useful resources for installing and gathering statistics on mod_deflate

Awstats – Detailed Web-Based statistics package
Perl-based mod_deflate statistics utility
Apache’s mod_deflate documentation
Firefox plugin to view the HTTP Response headers
(and lots of other useful stuff)

And here’s a sample Apache configuration section that I picked up from somewhere. I just save this in /etc/httpd/conf.d/deflate.conf and restart apache, then you are good to go (requires that mod_deflate is already compiled and installed. Actual file location may vary, depending on your OS. This works for Red-Hat derivatives)

### Enable mod_deflate to compress output
# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch bMSIE !no-gzip !gzip-only-text/html

## Don't compress for IE5.0
BrowserMatch "MSIE 5.0" no-gzip

# Don't compress images
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png|swf)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

## Log some stuff for mod_deflate stats
DeflateFilterNote Input instream
DeflateFilterNote Output outstream
DeflateFilterNote Ratio ratio

LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate
## end mod_deflate stats

#### END mod_deflate configuration

Defeating CAPTCHAs

A recent slashdot article about how Spammers may now be using humans to defeat captchas caught my attention.

Here’s how it would work: Spammers currently use scripts that make automated posts on forums, wikis, blogs, and virtually any other place where user submitted comments may appear on a website. Their posts include links to their “spamvertised” sites where they sell their junk. They benefit both by “advertising” to people who view their automated post, and by trying to trick search engines into generating more links to their site.

Many sites and programs now include CAPTCHA’s which display an image that is supposed to be difficult for a machine to read. The website confirms that the user enters the correct CAPTCHA solution before saving their post.

This article suggests that spammers are now sending these CAPTCHA images to a real human who will input the solution and send it back to the website, therefore allowing the spammer to post their links.

I’ve actually considered this possibility for a while, and it’s not very difficult at all. To prove the concept, I created a simple web service that a spammer’s automated script could post the image to. The service waits while a human types in the result, and then returns that result to the spammers script, which would use it to submit the spam.

If a human completes one of these every 3 seconds, then they could do about 1200 per hour. If you are paying somebody a couple bucks an hour, then it works out to about 0.17 cents (17/100 of a cent) per message. I’m not sure what the going rate for spam is, but this seems pretty reasonable. Twenty bucks would get you 12,000 links to your site.

The concept is incredibly simple — it took me about and hour to write. Try it out here:

http://www.brandonchecketts.com/capdef/

This raises some interesting concerns and questions:

  • The “appeal” of spam is that it has virtually no cost. Since hiring a human introduces a cost, does that mean it won’t get used.
  • Many CAPTCHAS are supposedly easily defeated by computer programs anyway

Some ideas on how to “really” make sure a human is hitting your site:

  • Introduce a minimum time delay between pages. (Ie: a human couldn’t fill out this form in 1 second like a script does)
  • Have some page element (that doesn’t look like it does anything) that “validates” that it has been downloaded. Since scripts will usually just fetch the HTML content, and not the graphics, make one graphic on your page that is really a script (that returns an image). If that image hasn’t been downloaded, than it’s not likely a human visiting.
  • Load your captcha graphic with javascript. Many spammers scripts aren’t able to successfully run javascript.
  • If you use a common captcha-generation program, change the default file name, or form field name.
  • Spammer’s scripts are written to affect the most sites possible. If you make some change on your site, so that it’s not the same as everybody else’s. then automated scripts are less likely to work on your site.

So much for good database design…

I’m installing and modifying WordPress MU (Multi-user) for a client and am amazed at the poor database design. For each blog you set up, it generates 8 new tables for that blog, which have an identical design to the same 8 tables for every other blog it creates. This is extremely poor design according to any database design standards.   Despite the poor design, they do have a good reason for doing it.  Quoted from http://mu.wordpress.org/faq/

Does it scale? (Also: The way you do your databases and tables doesn’t scale!)

WordPress MU creates tables for each blog, which is the system we found worked best for plugin compatibility and scaling after lots of testing and trial and error. This takes advantage of existing OS-level and MySQL query caches and also makes it infinitely easier to segment user data, which is what all services that grow beyond a single box eventually have to do. We’re practical folks, so we’ll use whatever works best, and for the 400k and counting on WordPress.com, MU has been a champ.

The main reason for doing this is that it makes compatibility with existing WordPress plugins much easier. I guess the real source of the problem was poor planning and foresight during the development of the original WordPress application.  They claim it works well, so even though I cringe every time I see it, I guess I’ll just have to live with it and complain.

rrdtool is cool

I’ve spent the last few days working on creating some graphs for my website at http://www.gamepricewatcher.com. I’ve used RRD databases as part of several applications like MRTG and Cacti. I’ve done some modifications to those apps, but never dug too much into the rrdtool specific code because it seemed fairly complicated.

When trying to get some graphs on my site before, I initially tried using Cacti. I had it successfully working, but adding new Games and Servers was extremely cumbersome to do with Cacti. I figured this would be a good time to dig in and learn how to use rrdtool myself to create the graphs. That way I can generate them directly and not have to work with an application that wasn’t designed to do what I wanted.

Fortunately, there are some good documentation and tutorials on how to use rrdtool I found the tutorial from Alex van den Bogaerdt to be the best starting place. It does a good job of explaining the basics. From there, I wrote the data-collection part of my code in Perl and used the RRDTool::OO module to create and update the databases.

To display the graphs, I tried installing the rrdtool php module, but there wasn’t one available for my distribution (CentOS). I eventually got it installed, but found that it is poorly documented, and decided to ditch the module in favor of calling the binary directly.

Now, I have pretty graphs like this on my website:

World of Warcraft Gold Prices
I have been collecting game-wide data for about a year now. However, with the addition of the RRD’s to my site, I’m now tracking server-specific prices in those too. That will make it so that I can eventually provide cross-server comparisons, and compare a specific server’s prices to the game’s average.