DHCP ‘always-broadcast’ confusion

I run a DHCP server using Linux’s dhcpd program to serve addresses to a bunch of clients.  These clients are connected over several wireless links, and the radios are sometimes quirky.  Specifically, some clients never get the DHCPOFFER unless the ‘always-broadcast’ parameter is on.  This usually works out fine.

Today, however, we had a problem where we just saw a bunch of incoming DHCPDISCOVER messages, to which the server would reply with a DHCPOFFER. The devices would just continually send discover messages, and none would ever DHCPREQUEST an address.

From what I can conclude, I think that the clients were confused when they received multiple broadcast responses back for their DHCPDISCOVER message.  The client would then send another discover message, which just caused a never-ending loop of requests and offers.

To resolve the problem, I turned off always-broadcast for a few minutes.  This made the clients wait for a random period of time before discovering again.  Some clients accepted the IP fine even though it wasn’t broadcast.  For the ones that didn’t, I then re-enabled always-broadcast, and they picked up an address the next time that they tried.

For a long term solution, we’re working on subnetting the two /24 networks that are currently together into smaller /26 or /27 blocks.  That should reduce the possibility of having this happen again.

The coolest, most efficient way to copy a directory between servers

I was recently quizzed about the quickest, most efficient way to copy an entire directory between servers. I typically do this by tar’ing it up on one server, copying it to the other, then extracting it. However, this has a couple obvious problems. One, is that it requires large chunks of disk space to hold the archive file on both the source and destination. If you are low on disk space, this can be a real pain. The other bad thing, is that it a waist of time since it reads through all of the files three times (read, copy, extract).

The original thought I had was to use “scp -r” which will recursively copy a directory over to the destination. This, however, doesn’t copy directories that start with a dot, and it doesn’t preserve file ownership information.

The best way, is to use a combination of tar and ssh. The idea is to tar the files up to STDOUT, then create an SSH session to the remote host, and extract from STDIN. After you’ve got the idea, the command is pretty simple:

tar -cvzf – /path/to/local/dir | ssh root@remotebox “cd /path/to/extract; tar -xvzf -”

That’s it. One simple command and it can save you tons of time creating, copying, and extracting by doing it all at once.

Technology in books and movies

I recently finished reading “Digital Fortress” by Dan Brown. The story was actually pretty good, and I would recommend it to others, but I have got to complain about the technology described in the book.

Essentially, the book is set around the NSA’s supercomputer called TRANSLTR. This machine is described as a multi-billion dollar, multi-million core computer that is used to brute-force encryption keys on encrypted documents. Supposedly this machine can crack most encrypted documents in minutes, and it has been stumped for as long as a couple hours on the most complex jobs.

Now, when the bright minds at the NSA try to decrypt the latest ‘unbreakable’ code with their fancy machine, it just works on it for hours and hours. The only interface that all of the technicians have though, is this ‘run-time monitor’ that says how long it’s been working on the latest code. The main character who supposedly did most of the programming on this machine doesn’t have any better debugging tools available than the single clock? Come on…

Equally annoying is the fact that TRANSLTR also has some built-in access into the NSA’s super-secret database of highly classified information. Therefore when TRANSLTR becomes exploited, its conveniently able to modify the firewall and access controls to the NSA’s secret database. Well, the NSA deserves it if they allow an outside system control like that.

There are a whole bunch of other little things (like the only manual power-off button is six stories beneath ground) that are annoying about this book. But the worst is near the very end where is supposed to be suspenseful. The final code is the ‘prime difference between Hiroshima and Nagasaki’). It takes the main characters (who are supposedly math geniuses) 20 pages to figure out that this is a numeric answer, despite the words ‘prime’ and ‘difference’. And another few pages to figure out that the the difference between 235 and 238 is three. Amazing

Cacti stops updating graphs after upgrade to version 0.8.6j

It turns out the latest update to Cacti, the popular SNMP and RRDTool graphing program, has a bug that makes it so graphs based on SNMP Data aren’t updated after upgrading.  The problem has to do with using the PHP “snmpgetnext” function, which is unimplemented in PHP 4. 

There is a discussion on Cacti’s forum at http://forums.cacti.net/about19199.html  where a developer posts a new ping.php that will resolve the problem.

Impressed with PowerDNS

I’ve spent the last couple weeks working with PowerDNS. We’re migrating our old BIND servers over to new PowerDNS servers that use a MySQL backend. Installation was fairly easy, because things were well documented. The application has worked perfectly, and when I emailed their mailing list to ask about a configuration setting that wasn’t documented, I got a useful reply within minutes.

Since PowerDNS is just the DNS Server, it doesn’t provide any user-interfaces for modifying the DNS information. I took a look at several of the possible applications that claimed to be “front ends” for PowerDNS, but didn’t find any that suited our needs. (I tried out WebDNS, Tupa, and a couple others listed on SourceForge). The existing tools were too complex, too simple, or too buggy. But, the database schema that PowerDNS uses, is pretty straightforward, so I wrote a PHP class that provides most of the necessary functions, and started our long-awaited customer interface that uses the class to allow our customers to maintain their own DNS records.

Overall, this has been a great project with great results.

Laughed for several minutes from this comic.

It’s not often that a simple comic will make me laugh for several minutes, but I thought this one was great.

I found it at http://www.redhat.com/magazine/025nov06/features/gifts/xkcd.jpg

sudo” is Linux program that allows regular users to execute certain commands as root. Often, when working as a non-root user, you will type one command, and it complain that you need to be root to run it. You would then quickly add “sudo” to the beginning of the command and execute it again, this time running as root, and the command will run successfully.

There’s no reason not to use mod_deflate

I’ve been trying to convince one of our larger customers to install mod_deflate on their server for about a month. They have had concerns about compatibility with older browsers and the possibility that it will affect the PageRank, but I have finally put enough pressure on to have them let me try it. They have very few users with old browsers (and really, if somebody is running using that archaic of a browser, how likely is it that they are going to buy something on your site), and convinced them that there should be no SEO consequences with the change (if anything, the search engines will respect your site more for using less bandwidth and having a knowledgeable administrator).

Early this morning, I got it installed (took all of about 15 minutes) and it’s running great. It’s compressing HTML and Javascript files to about 20% of their original size, which equates to some significant bandwidth savings, and quicker page-load times. About 60% of the total bandwidth used on this site is for HTML and JavaScript files (the other 40% is images, movies, and a few other odds and ends). Overall, it looks like about a 30-40% drop in total bandwidth usage, which is very significant. I’ve heard of no problems with browser compatibility either, so everybody is happy.

Overall, I’d say that there is no good reason not to use mod_deflate on your site. Especially if you ever get charged for bandwidth overages.

Here are some useful resources for installing and gathering statistics on mod_deflate

Awstats – Detailed Web-Based statistics package
Perl-based mod_deflate statistics utility
Apache’s mod_deflate documentation
Firefox plugin to view the HTTP Response headers
(and lots of other useful stuff)

And here’s a sample Apache configuration section that I picked up from somewhere. I just save this in /etc/httpd/conf.d/deflate.conf and restart apache, then you are good to go (requires that mod_deflate is already compiled and installed. Actual file location may vary, depending on your OS. This works for Red-Hat derivatives)

### Enable mod_deflate to compress output
# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch bMSIE !no-gzip !gzip-only-text/html

## Don't compress for IE5.0
BrowserMatch "MSIE 5.0" no-gzip

# Don't compress images
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png|swf)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

## Log some stuff for mod_deflate stats
DeflateFilterNote Input instream
DeflateFilterNote Output outstream
DeflateFilterNote Ratio ratio

LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate
## end mod_deflate stats

#### END mod_deflate configuration

Defeating CAPTCHAs

A recent slashdot article about how Spammers may now be using humans to defeat captchas caught my attention.

Here’s how it would work: Spammers currently use scripts that make automated posts on forums, wikis, blogs, and virtually any other place where user submitted comments may appear on a website. Their posts include links to their “spamvertised” sites where they sell their junk. They benefit both by “advertising” to people who view their automated post, and by trying to trick search engines into generating more links to their site.

Many sites and programs now include CAPTCHA’s which display an image that is supposed to be difficult for a machine to read. The website confirms that the user enters the correct CAPTCHA solution before saving their post.

This article suggests that spammers are now sending these CAPTCHA images to a real human who will input the solution and send it back to the website, therefore allowing the spammer to post their links.

I’ve actually considered this possibility for a while, and it’s not very difficult at all. To prove the concept, I created a simple web service that a spammer’s automated script could post the image to. The service waits while a human types in the result, and then returns that result to the spammers script, which would use it to submit the spam.

If a human completes one of these every 3 seconds, then they could do about 1200 per hour. If you are paying somebody a couple bucks an hour, then it works out to about 0.17 cents (17/100 of a cent) per message. I’m not sure what the going rate for spam is, but this seems pretty reasonable. Twenty bucks would get you 12,000 links to your site.

The concept is incredibly simple — it took me about and hour to write. Try it out here:


This raises some interesting concerns and questions:

  • The “appeal” of spam is that it has virtually no cost. Since hiring a human introduces a cost, does that mean it won’t get used.
  • Many CAPTCHAS are supposedly easily defeated by computer programs anyway

Some ideas on how to “really” make sure a human is hitting your site:

  • Introduce a minimum time delay between pages. (Ie: a human couldn’t fill out this form in 1 second like a script does)
  • Have some page element (that doesn’t look like it does anything) that “validates” that it has been downloaded. Since scripts will usually just fetch the HTML content, and not the graphics, make one graphic on your page that is really a script (that returns an image). If that image hasn’t been downloaded, than it’s not likely a human visiting.
  • Load your captcha graphic with javascript. Many spammers scripts aren’t able to successfully run javascript.
  • If you use a common captcha-generation program, change the default file name, or form field name.
  • Spammer’s scripts are written to affect the most sites possible. If you make some change on your site, so that it’s not the same as everybody else’s. then automated scripts are less likely to work on your site.

rrdtool is cool

I’ve spent the last few days working on creating some graphs for my website at http://www.gamepricewatcher.com. I’ve used RRD databases as part of several applications like MRTG and Cacti. I’ve done some modifications to those apps, but never dug too much into the rrdtool specific code because it seemed fairly complicated.

When trying to get some graphs on my site before, I initially tried using Cacti. I had it successfully working, but adding new Games and Servers was extremely cumbersome to do with Cacti. I figured this would be a good time to dig in and learn how to use rrdtool myself to create the graphs. That way I can generate them directly and not have to work with an application that wasn’t designed to do what I wanted.

Fortunately, there are some good documentation and tutorials on how to use rrdtool I found the tutorial from Alex van den Bogaerdt to be the best starting place. It does a good job of explaining the basics. From there, I wrote the data-collection part of my code in Perl and used the RRDTool::OO module to create and update the databases.

To display the graphs, I tried installing the rrdtool php module, but there wasn’t one available for my distribution (CentOS). I eventually got it installed, but found that it is poorly documented, and decided to ditch the module in favor of calling the binary directly.

Now, I have pretty graphs like this on my website:

World of Warcraft Gold Prices
I have been collecting game-wide data for about a year now. However, with the addition of the RRD’s to my site, I’m now tracking server-specific prices in those too. That will make it so that I can eventually provide cross-server comparisons, and compare a specific server’s prices to the game’s average.