Debugging with strace

strace is a useful Linux utility for watching the system calls that a program makes. I usuall don’t have to dig this deeply into an application to debug it, but I’m running int a problem with one application, and the developer recommended doing an strace to see if anything looks suspicious. Here’s the command I’m using:
strace -Fft -o /var/tmp/strace.out -p <PID>

This command has a couple useful options. the “Ff” makes the strace follow program forks. The “-t” makes it print a human readable timestamp before each line. The “-o” argument dumps the output to the specified file, and the -p argument attaches it to a specific process.

The output is fairly cryptic, but I’m hoping that it catches something useful

Finally, a Page Rank of 4!

Google is a mystery to me. The home page of this site has had a Page Rank of 2 for quite a while. Right, now, if you check out my links on Google, it shows the same ones that I’ve had forever, bu somehow I suddenly have a page rank of 4!

I have been working on generating more content on this site (like the semi-regular blog postings). As part of that, I installed WordPress, which I supposed search engines might like. I’ve also been generating some incoming links by posting a couple things on Slashdot and various other places. In addition, I’ve noticed that I’m slowly getting a few more people that have installed my speedtest, which has a link to this site.

I guess everything is working. Now I need to start doing this to the sites that I make money from 🙂

Changing the IP Address of a DNS Server

We’re upgrading our DNS Servers from BIND to PowerDNS, and at the same time, will be changing their IP addresses to move them onto different networks.

I looked all over the Internet and could never really find a way to change the IP address of a DNS server. It seemed that there was a chicken and egg problem. Suppose you have a domain named mydomain.com. At your registrar, you’ve told them that the Primary name servers for mydomain.com are ns1.mydomain.com and ns2.mydomain.com.

Now, if you change the IP address of ns1 & ns2.mydomain.com, how does the rest of the Internet know how to get to them? The solution is that somewhere, there is a global registry of DNS servers that really define where ns1 and ns2.mydomain.com are at. I’m not sure where this is at, but fortunately, our registrar (Godaddy) has a way to edit them. All that was required was to log into our Godaddy account, find the “host summary” section, and change the IP addresses there that were assigned to ns1 and ns2.

I assume that once we changed that, Godaddy submits those changes to the mysterious database of name servers. They said it takes 4-8 hours for that to happen, but I noticed queries coming in to our new servers immediately. Queries will continue to go to our old DNS server for a couple days. dnsstuff.com has a cool tool called “ISP Cached DNS Lookup” where you can see how long your DNS records are cached at many major ISP’s.

With careful planning and an decent understanding of how DNS works, our switchover went flawlessly.

External authentication for PowerDNS built-in web interface

I’ve been working with PowerDNS recently to replace our old Bind servers. One small issue I’ve had with the program, though, is that it’s built-in Web interface that displays statistics about the running server only works with a username and password. I didn’t particularly like this setup, because it means that everybody that needs access to it has the same password.

So, I configured the PowerDNS web server to only listen on the localhost, and the created an Apache instance on the server to perform the authentication, and then do a proxy lookup on the PowerDNS Web Interface.

PowerDNS Configuration from /etc/powerdns/pdns.conf

## Start the webserver for statistical information
webserver=yes
webserver-address=127.0.0.1
webserver-password=
webserver-port=9099

Apache Configuration
I just put this file in /etc/httpd/conf.d/pdns.conf You can use any type of authentication here that Apache supports, just like you would use in a .htaccess file

<Location /pdns/>
  AuthType Basic
  AuthName “Admin”
  AuthUserFile /var/www/html/.htpasswd
  Require valid-user
</Location>
ProxyPass /pdns http://127.0.0.1:9099/
ProxyPassReverse /pdns http://127.0.0.1:9099/

Impressed with PowerDNS

I’ve spent the last couple weeks working with PowerDNS. We’re migrating our old BIND servers over to new PowerDNS servers that use a MySQL backend. Installation was fairly easy, because things were well documented. The application has worked perfectly, and when I emailed their mailing list to ask about a configuration setting that wasn’t documented, I got a useful reply within minutes.

Since PowerDNS is just the DNS Server, it doesn’t provide any user-interfaces for modifying the DNS information. I took a look at several of the possible applications that claimed to be “front ends” for PowerDNS, but didn’t find any that suited our needs. (I tried out WebDNS, Tupa, and a couple others listed on SourceForge). The existing tools were too complex, too simple, or too buggy. But, the database schema that PowerDNS uses, is pretty straightforward, so I wrote a PHP class that provides most of the necessary functions, and started our long-awaited customer interface that uses the class to allow our customers to maintain their own DNS records.

Overall, this has been a great project with great results.

There’s no reason not to use mod_deflate

I’ve been trying to convince one of our larger customers to install mod_deflate on their server for about a month. They have had concerns about compatibility with older browsers and the possibility that it will affect the PageRank, but I have finally put enough pressure on to have them let me try it. They have very few users with old browsers (and really, if somebody is running using that archaic of a browser, how likely is it that they are going to buy something on your site), and convinced them that there should be no SEO consequences with the change (if anything, the search engines will respect your site more for using less bandwidth and having a knowledgeable administrator).

Early this morning, I got it installed (took all of about 15 minutes) and it’s running great. It’s compressing HTML and Javascript files to about 20% of their original size, which equates to some significant bandwidth savings, and quicker page-load times. About 60% of the total bandwidth used on this site is for HTML and JavaScript files (the other 40% is images, movies, and a few other odds and ends). Overall, it looks like about a 30-40% drop in total bandwidth usage, which is very significant. I’ve heard of no problems with browser compatibility either, so everybody is happy.

Overall, I’d say that there is no good reason not to use mod_deflate on your site. Especially if you ever get charged for bandwidth overages.

Here are some useful resources for installing and gathering statistics on mod_deflate

Awstats – Detailed Web-Based statistics package
Perl-based mod_deflate statistics utility
Apache’s mod_deflate documentation
Firefox plugin to view the HTTP Response headers
(and lots of other useful stuff)

And here’s a sample Apache configuration section that I picked up from somewhere. I just save this in /etc/httpd/conf.d/deflate.conf and restart apache, then you are good to go (requires that mod_deflate is already compiled and installed. Actual file location may vary, depending on your OS. This works for Red-Hat derivatives)

### Enable mod_deflate to compress output
# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch bMSIE !no-gzip !gzip-only-text/html

## Don't compress for IE5.0
BrowserMatch "MSIE 5.0" no-gzip

# Don't compress images
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png|swf)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

## Log some stuff for mod_deflate stats
DeflateFilterNote Input instream
DeflateFilterNote Output outstream
DeflateFilterNote Ratio ratio

LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate
## end mod_deflate stats

#### END mod_deflate configuration

Defeating CAPTCHAs

A recent slashdot article about how Spammers may now be using humans to defeat captchas caught my attention.

Here’s how it would work: Spammers currently use scripts that make automated posts on forums, wikis, blogs, and virtually any other place where user submitted comments may appear on a website. Their posts include links to their “spamvertised” sites where they sell their junk. They benefit both by “advertising” to people who view their automated post, and by trying to trick search engines into generating more links to their site.

Many sites and programs now include CAPTCHA’s which display an image that is supposed to be difficult for a machine to read. The website confirms that the user enters the correct CAPTCHA solution before saving their post.

This article suggests that spammers are now sending these CAPTCHA images to a real human who will input the solution and send it back to the website, therefore allowing the spammer to post their links.

I’ve actually considered this possibility for a while, and it’s not very difficult at all. To prove the concept, I created a simple web service that a spammer’s automated script could post the image to. The service waits while a human types in the result, and then returns that result to the spammers script, which would use it to submit the spam.

If a human completes one of these every 3 seconds, then they could do about 1200 per hour. If you are paying somebody a couple bucks an hour, then it works out to about 0.17 cents (17/100 of a cent) per message. I’m not sure what the going rate for spam is, but this seems pretty reasonable. Twenty bucks would get you 12,000 links to your site.

The concept is incredibly simple — it took me about and hour to write. Try it out here:

http://www.brandonchecketts.com/capdef/

This raises some interesting concerns and questions:

  • The “appeal” of spam is that it has virtually no cost. Since hiring a human introduces a cost, does that mean it won’t get used.
  • Many CAPTCHAS are supposedly easily defeated by computer programs anyway

Some ideas on how to “really” make sure a human is hitting your site:

  • Introduce a minimum time delay between pages. (Ie: a human couldn’t fill out this form in 1 second like a script does)
  • Have some page element (that doesn’t look like it does anything) that “validates” that it has been downloaded. Since scripts will usually just fetch the HTML content, and not the graphics, make one graphic on your page that is really a script (that returns an image). If that image hasn’t been downloaded, than it’s not likely a human visiting.
  • Load your captcha graphic with javascript. Many spammers scripts aren’t able to successfully run javascript.
  • If you use a common captcha-generation program, change the default file name, or form field name.
  • Spammer’s scripts are written to affect the most sites possible. If you make some change on your site, so that it’s not the same as everybody else’s. then automated scripts are less likely to work on your site.

So much for good database design…

I’m installing and modifying WordPress MU (Multi-user) for a client and am amazed at the poor database design. For each blog you set up, it generates 8 new tables for that blog, which have an identical design to the same 8 tables for every other blog it creates. This is extremely poor design according to any database design standards.   Despite the poor design, they do have a good reason for doing it.  Quoted from http://mu.wordpress.org/faq/

Does it scale? (Also: The way you do your databases and tables doesn’t scale!)

WordPress MU creates tables for each blog, which is the system we found worked best for plugin compatibility and scaling after lots of testing and trial and error. This takes advantage of existing OS-level and MySQL query caches and also makes it infinitely easier to segment user data, which is what all services that grow beyond a single box eventually have to do. We’re practical folks, so we’ll use whatever works best, and for the 400k and counting on WordPress.com, MU has been a champ.

The main reason for doing this is that it makes compatibility with existing WordPress plugins much easier. I guess the real source of the problem was poor planning and foresight during the development of the original WordPress application.  They claim it works well, so even though I cringe every time I see it, I guess I’ll just have to live with it and complain.