Web Programming, Linux System Administation, and Entrepreneurship in Athens Georgia

Category: General (Page 14 of 25)

When Random Isn’t Very Random

I have a PHP function that I wrote a long time ago to generate a random string of characters:

function randomString($size = 25)
{
    $charset = 'abcdefghijklmnopqrstuvwyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
    $string = '';
    for ($i=0; $i < $size; $i++) {
        $string .= $charset[(mt_rand(0,(strlen($charset) -1)))];
    }
    return $string;
}

I have been using this to generate a random ID, which is then inserted into a database column as a unique key. Theoretically, it has
6225 possible unique values (or 6.45 * 1044) if case sensitive or 3625 (2.36 * 1035) when used in a case-insensitive application. That is a lot of possible combinations, and I figured it would be rare that I’d get two that ever were the same.

I was wrong.

I must be running into some kind of issue where the pseudo-randomness is not as random as I thought. I’m inserting somewhere in the neighborhood of 25k rows a day into this table for the past couple weeks and have had over 20 errors generated where the database complained that the unique key already existed. I investigated a couple and found that it was, indeed correct. My database is not case sensitive and would have complained if the two had the same characters even if the cases weren’t the same. So I was pretty surprised when I looked at the errors and found that in each case the 25 character strings were exactly the same, even all of the letters in it were the same case.

So I’ve had to revert to another method that I like a little better anyway.

Google Spam Filtering Sounds Great but I Can’t Sign Up

Google announced yesterday new services and pricing based on their Postini message filtering service. The service sounds great, and I’ve been looking at moving away from my current mail filtering service for a couple months now. Pricing starts out at only $3.00 per user per year. I did a little checking around, and verified that I can add domain aliases and user aliases and that it looks like they can be tied to a single $3.00 account.

That is exactly like what I need. I have a bunch of domains, and use several email addresses at each one that all forward to a single Inbox. For $3.00 a year, it sounds like a great savings over my alternate plan which was creating my own MailScanner box. Plus with Google, I won’t have to worry about redundancy, or keeping my own filtering up to date.

Perfect, so I went to sign up.  I put in my domain name, agreed to the TOS, then put in my credit card information and hit submit:

Google won't let me sign up for Posting

Oops, looks like something went wrong there. That’s not the best way to instill confidence into your new customers.

Amazon ECS 3.0 is shutting down

Amazon’s long lived ECS 3.0 will be shutting down soon. This was an early version of their API that allowed 3rd party applications to access Amazon’s vast database of books and products. It was very widely used, and it will be interesting to see what kind of impact it has when they turn it off for good on March 31st. I’m sure there are plenty of small sites that will break in some way when they turn it off.

From what I’ve seen, they have been pretty good at notifying customers who are still using it. I’ve gotten several emails about it in the past couple weeks. Despite all of their efforts, though, I’m sure that there has got to be all kinds of small sites that were written at one point and haven’t been touched since.

The easiest way to tell if you have a site that uses it, is to grep through all of your code for ‘AssociateTag’ or ‘SubscriptionId’. Those are the authentication parameters used by the 3.0 version that it shutting down. The newer version of ECS uses a ‘AWSAccessKeyId’ parameter instead.

If you have a PHP or Perl-based website that needs to be upgraded to the new version, you can hire me to fix it.

GoDaddy’s DNS Doesn’t Update SOA Serial

I recently moved one of my blog’s to it’s own IP address, but strangely Google’s feed readers are still picking up that site on the original IP Address. It has been several days now, and Google is still requesting the site at the old IP Address. I did some digging and found that even though I changed the IP Address, the SOA Serial didn’t get incremented. As a result, Google’s DNS servers are using cached records and not requesting new ones because the serial hasn’t changed.

This seems like a pretty serious problem for GoDaddy. I double checked everything again tonight by creating some new records. The new records resolve to the IP’s that I specified, but the serial remains unchanged.

I tried various things that definitely should have caused the serial number to be incremented:
– Adding a new A record,
– Modifying an A record
– Deleting an A record

None of which updated the serial as it should have.

Finally, I noticed on the main page for my domain (the one that lists the name servers, registrant info, etc) that next to Name Servers: it said Last Update: 11/23/2007, which coincided with the date of the serial. I was finally able to update the Serial by acting like I was changing my name servers, then just submitting the page without making any changes.

It seems this is a fundamentally broken DNS system though. Frankly, I’m pretty surprised to have something like that from GoDaddy, where I’m sure you do DNS for hundreds of thousands of domains. While troubleshooting, I emailed GoDaddy’s support a couple times and were less than helpful. Their basic response was:

We are unable to update the SOA serial on demand. This information is updated periodically, and is the way our systems currently process. We apologize for any inconvenience this may cause.

Converting mbox’s to maildir format

There is a handy utility for converting mbox style mailboxes into maildir format at https://batleth.sapienti-sat.org/projects/mb2md/

To convert all of the mailboxes on your server:

Edit /etc/sudoers and comment out the env_keep section. These variables make it so that the sudo command keeps some environment variables and tries to put things in the wrong directory.

Download mb2db, unzip it, and copy the binary to /bin (where all users can access it)

# wget https://batleth.sapienti-sat.org/projects/mb2md/mb2md-3.20.pl.gz
# gunzip mb2db-3.20.pl.gz
# cp mb2db-3.20.pl.gz / bin

Then run this command to convert all of the mailboxes into maildir format.

cd /var/spool/mail

for username in `ls`; do echo $username; sudo -u $username /bin/mb2md -m -d Maildir; done

That will create a directory called Maildir in each user’s home directory. Then just configure your MTA to deliver mail there, and your IMAP server to pick it up there
In postfix, add this to /etc/postfix/main.cf

home_mailbox = mail/

And in Dovecot, change this in /etc/dovecot.conf

mail_location=maildir:~/mail/

Now you can edit /etc/sudoers and uncomment the env_keep section.

Configuring Postfix SASL to authenticate against Courier Authlib

I ran across a system today that was using the VHCS control panel. It looks like the system wasn’t correctly configured to allow SMTP authentication. It uses Postfix as the MTA and Courier-IMAP for the Imap/POP3 server. It was populating the Courier-authentication database with email addresses and passwords to use for logging into the incoming mail server, but postfix wasn’t configured to use the same database for authenticating and providing an outgoing mail server.

This is what I had to do to get it working

Edit your system’s smtpd.conf file (/var/lib/sasl2/smtpd.conf for RedHat and derivatives. /etc/postfix/sasl/smtpd.conf for Debian and Ubuntu derivatives). And put in this content:

I think this is a default install looks like:

pwcheck_method: saslauthd
mech_list: PLAIN LOGIN

So change it to this:

pwcheck_method: authdaemond
mech_list: PLAIN LOGIN
authdaemond_path: /var/run/courier/authdaemon/socket

Of course, make sure that the authdaemond_path is correct for your system, and change as needed.

Then restart postfix and see if that works. You can use my SMTP Authentication String tool to get your encoded password and try it through telnet. Tail your mail log to see if it gets any errors.

On the system I was working on. Postfix was configured to chroot the smtpd processes (in /etc/postfix/master.cf). I got errors in the mail log that looked like this:

Jan 24 19:52:46 host postfix/smtpd[14528]: warning: SASL authentication failure: cannot connect to Courier authdaemond: No such file or directory
Jan 24 19:52:46 host postfix/smtpd[14528]: warning: SASL authentication failure: Password verification failed
Jan 24 19:52:46 host postfix/smtpd[14528]: warning: host.local[127.0.0.1]: SASL plain authentication failed: generic failure

So, in that case, I simply hard-linked the courier authdaemon socket file inside of the chroot (/var/spool/postfix)

cd /var/spool/postfix
ln /var/run/courier/authdaemon/socket courier-authdaemon-socket

Then change the authdaemond_path to just ‘courier-authdaemon-socket’. Restart postfix and it should work

Getting a MySQL last insert_id from an ADOdb connection

The PHP ADOdb libraries are a database abstraction layer that tries to hide the database specific commands from the programmer.  It tries to allow the programmer to write code that will be portable between any backend database engine.  Since not all databases provide an insert id, ADOdb provides a wrapper for it in the form of it’s Insert_ID() function.

It implements it in a really ugly way though. Whenever you use it’s pseudo insert_id functions, it creates a _seq table with a single column and a single row. For example, if you are inserting something into a table named ‘users’, it will create a table named ‘users_seq’ with a single ‘id’ column. It generates one row in that column with an insert id that it calculates and increments on it’s own.

First off, that is really ugly. I hate having a whole bunch of extra tables in my database, and it makes it even worse that they only have a single value in them. I wish they would have implemented that differently, and made a single ‘_sequences’ table with two columns (table and id).   At least that would keep the tables to a minimum and centralize where all of the insert id’s are at.

The other bad part about it, is if you access the database with anything other than the ADOdb application, it is difficult to use this required structure. In most cases, things break and I get duplicate key constraints and it is just generally a pain.

So I’ve decided not to use it ever again. Its not likely that I’ll ever change the database anyway, so I might as well take advantage of the handy insert_id functionality already provided by MySQL.  Just do your queries as you would normally, including the ‘INSERTID’, and then you can retrieve the insert_id like in this example:

CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(80) NOT NULL,
  `email` varchar(80) NOT NULL,
) ;
// I assume you know how to create a ADOdb object
$db ->query("
    INSERT INTO USERS (id, name, email) VALUES ('INSERTID', 'Joe User', '[email protected]')
");
$user_id = $db->_connectionID->insert_id;

Performing post-output script processing in PHP

After several hours of researching end experimenting, I think I finally came up with a way for a PHP script to display a page, close the connection to the browser, and then to continue processing. The idea is that I can add some potentially lengthy processing to the script by executing it after the browser has closed the connection, but to a visitor, the page appears to load quickly.

I experimented with PHP’s register_shutdown_function, but that doesn’t really do what I need (unless running < PHP 4.0.3). Evidently PHP doesn’t have any way to close STDOUT, like other languages do.

The trick is in sending a Connection: close and Content-Length header. Once a client has received the specified number of bytes, it will close the connection, even though the script may continue. Unfortunately, that means that you need to know the length of the page before displaying it. That can be handled with output buffering, but does make the solution less than ideal.

Here is an example that works for me using PHP 5.1.6.

<?php

$start_time = microtime(true);
function bclog($message)
{
    global $start_time;
    $fh = fopen('/tmp/logfile', 'a');
    $elapsed = microtime(true) - $start_time;
    fwrite($fh, "$elapsed - $message\n");
    fclose($fh);
}

header('Content-type: text/plain');
header('Connection: close');
ob_start();

for ($i = 0; $i < 1024; $i++ ) {
    echo "#";
}
bclog("I'm done outputting my normal content");

// Figure the size of our content
$size = ob_get_length();
// And send the content-length header
header("Content-Length: $size");

// Now flush all of our output buffers
ob_end_flush();
ob_flush();
flush();

sleep(5);
bclog("Now I'm done with all of my post-processing - FYI, content length was $size");
?>

If you hit that page in a browser, you will notice that the browser displays the content and is done right away. However, you can tail that logfile, and see something like this:

0.0002360343933 - I'm done outputting my normal content
5.0019490718842 - Now I'm done with all of my post-processing - FYI, content length was 1024

It is not an ideal solution, but I think that is about as good as it is going to get

The new wave of HTTP referrer spam

I’ve noticed an increase in HTTP Referrer spam on my own web site and in some websites that I manage. See Wikipedia’s articles on the HTTP Referrer and Referrer spam for a definition of what exactly referrer spam is.

Wikipedia, and some other pages on the Internet that I found describing referrer spam say that the spammer’s intent is to end up on published web stats pages in order to create links to their site. I don’t think that is (or no longer is) the case.

I would argue that the real intent of these spammers is to get the website owner who is looking at the stats, to click on their links. Most users who have a blog or small website check their statistics often, and are really interested when they find a new site that appears to be linking to theirs. It is very likely that they will intentionally look at any new incoming links.

As evidence along this route, I just noticed that I got 4 hits on one of my sites with the following referrer:

https://www.amazon.com/s/ref=sr_pg_4&tag=somespamer_20

I’m familiar with Amazon’s link structure and immediately noticed that it was an affiliate URL. If you hit that URL, then Amazon will attribute your click as coming from the spammer. Amazon will set a cookie that contains the spammers affiliate ID, and any purchase that you make at Amazon in the next 30 days will be credited to the spammer. They will then get a 4% commission on your purchases.

Obviously, not everybody buys something from Amazon once a month, but I’d bet that enough people do to make it worth the risk. Fortunately, it looks like Amazon has already caught on to this one, and that particular link just goes to an error page now.

That is a pretty deceitful and probably successful tactic for the spammer. Creating referrer spam is incredibly easy. I don’t think there is any great way to detect it either. I’ve seen some WordPress plugins and such that attempt to deal with it, but I don’t think there is much going on in this area yet.

My first thought would be to request the referred page and look for links to your site. That has some potential problems working reliably on a large scale though. Also, it might enable a sortof distributed denial of service by proxy attack.

Another possible way to fight referrer spam would involve a blacklist. t could contain both IP Addresses of known spammers, and the links that they are spamming. I found one called referrercop that looks like it is owned by Google now, so that may show some promise – although it doesn’t look like it has been updated recently.

Regular Expression matching with newlines

I ran across a regular expression modifier today that I have not used before. When matching some text that spans multiple lines, you can use the ‘s’ modifier at the end of the regular expression to treat the string as a single line.

For example, I was trying to match some html that spanned multiple lines like this

<td class='something'>  This is the text I want to match
</td>

This expression didn’t match:

preg_match_all("#<td class='someting'>(.+?)</td>#", $source_string, $matches);

But simply adding the ‘s’ flag after the closing #, it worked as desired:

preg_match_all("#<td class='someting'>(.+?)</td>#s", $source_string, $matches);
« Older posts Newer posts »

© 2025 Brandon Checketts

Theme by Anders NorenUp ↑