When Random Isn’t Very Random

I have a PHP function that I wrote a long time ago to generate a random string of characters:

function randomString($size = 25)
{
    $charset = 'abcdefghijklmnopqrstuvwyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
    $string = '';
    for ($i=0; $i < $size; $i++) {
        $string .= $charset[(mt_rand(0,(strlen($charset) -1)))];
    }
    return $string;
}

I have been using this to generate a random ID, which is then inserted into a database column as a unique key. Theoretically, it has
6225 possible unique values (or 6.45 * 1044) if case sensitive or 3625 (2.36 * 1035) when used in a case-insensitive application. That is a lot of possible combinations, and I figured it would be rare that I’d get two that ever were the same.

I was wrong.

I must be running into some kind of issue where the pseudo-randomness is not as random as I thought. I’m inserting somewhere in the neighborhood of 25k rows a day into this table for the past couple weeks and have had over 20 errors generated where the database complained that the unique key already existed. I investigated a couple and found that it was, indeed correct. My database is not case sensitive and would have complained if the two had the same characters even if the cases weren’t the same. So I was pretty surprised when I looked at the errors and found that in each case the 25 character strings were exactly the same, even all of the letters in it were the same case.

So I’ve had to revert to another method that I like a little better anyway.

Leave a Reply

Your email address will not be published. Required fields are marked *