Array versus String in CURLOPT_POSTFIELDS

The PHP Curl Documentation for CURLOPT_POSTFIELDS makes this note:

This can either be passed as a urlencoded string like ‘para1=val1&para2=val2&…’ or as an array with the field name as key and field data as value. If value is an array, the Content-Type header will be set to multipart/form-data.

I’ve always discounted the importance of that, and in most cases it doesn’t generally matter. The destination server and application likely know how to deal with both multipart/form-data and application/x-www-form-urlencoded equally well. However, the data is passed in a much different way using these two different mechanisms.

application/x-www-form-urlencoded

application/x-www-form-urlencoded is what I generally think of when doing POST requests. It is the default when you submit most forms on the web. It works by appending a blank line and then your urlencoded data to the end of the POST request. It also sets the Content-Length header to the length of your data. A request submitted with application/x-www-form-urlencoded looks like this (somewhat simplified):

POST /some-form.php HTTP/1.1
Host: www.brandonchecketts.com
Content-Length: 23
Content-Type: application/x-www-form-urlencoded

name=value&name2=value2

multipart/form-data

multipart/form-data is much more complicated, but more flexible. Its flexibility is required when uploading files. It works in a manner similar to MIME types. The HTTP Request looks like this (simpified):

POST / HTTP/1.1
Host: www.brandonchecketts.com
Content-Length: 244
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------26bea3301273

And then subsequent packets are sent containing the actual data. In my simple case with two name/value pairs, it looks like this:

HTTP/1.1 100 Continue
------------------------------26bea3301273
Content-Disposition: form-data; name="name"

value
------------------------------26bea3301273
Content-Disposition: form-data; name="name2"

value2
------------------------------26bea3301273--

CURL usage

So, when sending POST requests in PHP/cURL, it is important to urlencode it as a string first.

This will generate the multipart/form-data version

$data = array('name' => 'value', 'name2' => 'value2');
curl_setopt($curl_object, CURLOPT_POSTFIELDS,  $data)

And this simple change will ensure that it uses the application/x-www-form-urlencoded version:

$data = array('name' => 'value', 'name2' => 'value2');
$encoded = '';
foreach($data as $name => $value){
    $encoded .= urlencode($name).'='.urlencode($value).'&';
}
// chop off the last ampersand
$encoded = substr($encoded, 0, strlen($encoded)-1);
curl_setopt($curl_object, CURLOPT_POSTFIELDS,  $encoded)

KnitMeter is now a Facebook App

KnitMeter.com is a site that I wrote quickly for my wife to keep track of how much she has knit. It generate a little ‘widget’ image that can be placed on blogs, forums, etc and says how many miles of yarn you have knit in some period. The site has been live for about a year and a half now and has a couple thousand registered users.

I have been receiving an increasing number of requests to add a method for adding a KnitMeter it to Facebook. I’ve experimented with a couple of other ideas on Facebook and found that it was pretty straightforward to write an app. KnitMeter seems like a decent candidate for a social app, so I started working on it about a week ago. I’m happy to say that I just made the application live late last night. It is available at http://apps.facebook.com/knitmeter/. If you use other social media apps, then go here where you can buy YouTube subscribers or views, or even Instagram followers.

Features include:

  • Ability to add projects and add knitted lengths to a project (or not)
  • Settings for inputting lengths in feet, yards, or meters
  • Display how much you’ve knit in feet, yards, meters, kilometers, or miles
  • When entering a new length, you can choose to have it publish a ‘story’ on your profile page
  • You can add a tab on your profile page that shows each of your projects as well as a total
  • You can add a KnitMeter ‘box’ to the side of your profile page, or on your ‘boxes’ tab.

I recreated the database from scratch and defined it a little better, so I have a little bit of work to do in migrating the existing site and database over to the new structure. Once that is done users will be able to import their data from the existing KnitMeter.com by providing their email/password.

What is in a gclid?

When you use auto-tagging with your Adwords campaign, all request that are generated by Google Adwords contain a ?glcid parameter in the Request. Adwords uses this to pass some information to Analytics for traffic analysis.

I was curious, about what data the gclid parameter contained. My guess was that it contained some encoded or encrypted information regarding the origin of the click, so I did some analysis on the clicks that I received. Some discussion about it was available on this post.

I ended up writing a quick PHP script that parses through an Apache log file. It finds requests that contain a gclid and then produces a report of which letters occur in which positions of the gclid.

The script is available for download here, and it generates a report like this:

Found 32507 appropriate lines
Character  1 [ 1] C
Character  2 [ 8] IJKLMNOP
Character  3 [32] -CDGHKLOPSTWX_abefijmnqruvyz2367
Character  4 [64] -CDEFG0ABHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789
Character  5 [32] -_0ghijklmnopqrstuvwxyz123456789
Character  6 [32] -IJKLMNOPYZ_abcdefopqrstuv456789
Character  7 [32] -CDGHKLOPSTWX_abefijmnqruvyz2367
Character  8 [64] -ABCDEFG0HIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789
Character  9 [32] 0-_ghijklmnopqrstuvwxyz123456789
Character 10 [ 4] JZp5
Character 11 [ 8] IMQUYcgk
Character 12 [ 1] C
Character 13 [ 1] F
Character 14 [10] QRSUWYZcde
Character 15 [61] -ABCEFGHIJKLMNOPQRSTUVWXYZ_ab0cdefghiklmnopqrstuvwxy123456789
Character 16 [63] -ABCDEFGHIJKLMNOQRSTUVWXYZ_abcde0fghijklmnopqrstuvwxyz123456789
Character 17 [17] DFGHIQabgiknrsx57
Character 18 [ 4] AQgw
Character 19 [ 1] o
Character 20 [ 1] d
Character 21 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwx0yz123456789
Character 22 [32] ABCDEFGHQRSTUVWXghijklmnwyz0x123
Character 23 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuv0wxyz123456789
Character 24 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrs0tuvwxyz123456789
Character 25 [62] 0-ABCDEFHIJKLMOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789
Character 26 [ 4] AQgw

This makes it clear that the parameter has some structure, but I’m still no closer to determining what it contains. Counting up the unique values, it would seem that they have about 95 bits of information available, which might be enough room to store everything it would need to know about the search that created it. Based on the reporting details in Analytics, I would presume that it somehow contains at least the following information:

  • Campaign (id)
  • Keyword (id)
  • Ad Variation (id)
  • Position

I did some research by clicking an ad multiple times and examining the glcids for those:

        12345678901234567890123456
/?gclid=CNHz5eD_8pkCFRCdnAodzniYQg
/?gclid=CIX_u-X_8pkCFQKenAodlWprSg
/?gclid=CMyI_4OA85kCFRIhnAodc2_oRg
/?gclid=CO_0pYyA85kCFQghnAodDDpaRQ
/?gclid=CIXo9JeA85kCFRIhnAodc2_oRg
/?gclid=CLitgp2A85kCFQubnAod1nx7Qg
/?gclid=CN3_1aOA85kCFQghnAodDDpaRQ
/?gclid=CPyi1quA85kCFRabnAodWnZbRQ 
/?gclid=COq-67OA85kCFRMhnAodyQvSRg
/?gclid=COOplrmA85kCFRCdnAodzniYQg

I noticed that most of the characters which use 32-64 characters vary quite a bit except for character #9, which was always an 8, and character #10 which was a ‘p’ for the first two clicks, and then a ‘5’ for all subsequent clicks. That likely has some significance, but I’m out of time for playing with it for now.

Hopefully the script and this basic analysis might be of use for somebody else to use in digging into it further.

One other thought that I had is that the data (or each field) is somehow encrypted and when you ‘link’ your Analytics account to your Adwords account it shares the decryption key so that it can get at the detail.

Announcing WebPasswd

Do you have users who need access to web-based applications on multiple servers? Managing those users can be a pain when dealing with normal htpasswd-based permissions. Adding or removing users means editing each htpasswd file and remembering where all of them are.

Mod_auth_mysql is a good way to centralize that user database so that you can avoid having all of the separate htpasswd files. The apache module is available from any modern Linux distribution, so installing and configuring it takes less than 5 minutes. I started using it almost 2 years ago, and over that time have made a simple web application for managing the users and granting them permission to each application.

I’ve released the program as WebPasswd for anybody else who wants to use it. Now adding users and granting them access to application can be don with just a few clicks. Granting and revoking access to an application takes just seconds and is applied immediately. Configuring a new application takes a couple clicks, and then you just copy/paste the Apache configuration into the appropriate place on your web server. Try it out with this demo.

I think this will be useful to people. I have not seen another application that does something similar. Let me know if it works for you.

PHP 5.1 Doesn’t have timezone_identifiers_list() by default

According to the PHP documentation for timezone_identifiers_list(), that function should be included in PHP 5.1.x. The note on DateTime installation mentions, however, that it was only experimental support, and had to be compiled specifically to support it.

The fix, then, is to recompile PHP 5.1.x with

CFLAGS=-DEXPERIMENTAL_DATE_SUPPORT=1

or to upgrade to PHP 5.2 where it is enabled by default.

My particular problem surfaced with some Drupal code that required the function.

Save Internet Audio Streams to MP3s

I’ve got a couple of radio programs that I like to listen to. The only problem is that I rarely am able to listen to them live. I was wishing that somebody made a good DVR-like device for the radio, but after some thought figured out a way to do it on my own using internet audio streams that most radio stations now have available.

Googling for instructions on how to save Internet audio streams will return a lot of semi-workable but mostly garbage instructions. The best set of instructions I found was at Instructables.com where the basic concept is to use the command-line mplayer to save the stream as a wave file, then use lame to convert it to an MP3.

The instructables tutorial had several downfalls though. First, it is not able to stop mplayer on its own, so it uses a second cron job to kill the original mplayer command – a little to crude for my taste. Secondly, and more importantly, you have to know the exact stream URL which is not easy to identify from most internet radio websites. They tend to hide the actual stream source behind layers of javascript so that their web-based players can synchronize ads and such while listening to the streams.

I created some PHP code that automates this process and makes it pretty simple. The basic streamsave class provides functions for downloading the wave file and converting it to an MP3. I then extend that class for specific radio stations that I want to save. The extended class provides functions that run through all of the javascript garbage to get to the actual stream source.

Using those classes, this simple script now saves my stream to an MP3 file and emails me the location when it is done:

<?php
require_once dirname(__FILE__).'/ss_640wgst.class.php';

$streamsave = new ss_640wgst();

$streamsave->stream_url = $streamsave ->getStreamURL();
$streamsave->seconds = 60 * 60; // One hour

// This saves the stream to a temporary wav file
$streamsave->save_stream();

// Now encode it to an mp3
$output_file = "/tmp/some_directory/some_program_".date('Y-m-d-His').'.mp3';
$streamsave->encode_to_mp3($output_file);

// Delete the large wav file
unlink($streamsave->wavfile);

// And tell me that the file was saved
echo "File saved (if all went okay) to {$streamsave->mp3file}\n\n";
mail('you@yourdomain.com', 'Audio File Saved', "File saved to {$streamsave->mp3file}");

// It would be cool to create a podcast XML file here that contains your new file

?>

Downloads

The abstract class file: streamsave.class.php
The extended class specifically for 640 WGST in Atlanta: ss_640wgst.class.php

I’ve created the extended classes for stations that are useful for me. If there seems to be any interest, I can work on developing that a bit more to make it more generalized and work for more radio stations.

Preparing WordPress for a Large Traffic Spike

The Hallmark Hall of Fame Movie ‘Front of the Class’ premiered this past weekend with an expected 12-15 million viewers.  We have been preparing the website (ClassPerformance.com) for the event. We expected a significant number of visitors to the website in the 24-48 hours after the movie aired, so I did a number of things to ensure that the site would be able to run without incident during this critical time.

  1. Move temporarily to a higher powered server.
  2. The site is normally hosted on an inexpensive shared-hosting plan. I’ve run some shared-hosting servers before and don’t have much faith that they would handle any amount of significant load. They also usually don’t allow you to configure some of the Apache settings that I was planning on using below.

  3. Serve images and other static content from an alternate location.
  4. I set up a domain alias of ‘static.classperformance.com’ pointed to the same DocumentRoot as the main site. Then I edited the template files to serve most of the background, header, and footer images from that location. For normal usage, serving them from the same server works fine, but this allows the flexibility to move that static content to a separate server if/when it is needed.

    I also copied the entire website to a second server and had it configured so that at any time I could change DNS to point ‘static.classperformance.com’ to the second server in order to reduce the bandwidth from the primary server

  5. Generate static pages wherever possible.
  6. I used wget to download everything, and then deleted the pages that needed to be parsed through PHP (ie: contact forms, etc). Most of the pages don’t change from visitor to visitor, so this can be done for the home page, all of the blog posts, and any other pages. This significantly reduces the overhead due to database queries and just the overhead of running PHP and including multiple files.

    I then added this to my Apache configuration to tell the web server to use the static content if it exists:

        ## Serve static content for files that exist
        RewriteCond /home/classperformance.com/www/rendered/%{REQUEST_URI} -f
        RewriteRule (.*) /rendered/$1 [L]
    
        ## For requests without an extension, wget has saved those files as 'index.html'
        ## so the rewrite rule needs to reflect that:
        RewriteCond /home/classperformance.com/www/rendered/%{REQUEST_URI} -d
        RewriteRule (.*) /rendered/$1/index.html [L]
    

    I did some performance tests with ApacheBenchmark, and serving the static content had a dramatic effect on the speed, and the number concurrent users. There is probably a more elegant way to configure mod_cache do a similar thing in a more automated fashion, but this was quick and easy, and I didn’t have to worry about checking the various HTTP headers. In my opinion, this was the single most effective thing to do. By serving static content, Apache also correctly handles many of the HTTP headers that enable effective caching (E-Tags, expires, last-modified, etc).

  7. Installed a PHP Accelerator
  8. I’ve previously written about how easy and effective eAccelerator is to install. There are very few scenarios where this is not effective. Again, ApacheBenchmark tests easily showed a huge increase in the number of concurrent requests when eAccelerator was enabled.

  9. Check Apache settings
  10. On a vanilla CentOS install, Apache has the ServerLimit set to 256. By serving primarily static content, you will likely reduce the amount of memory that each Apache child requires, and have memory for more children. I did some quick math and figured that I could have around 800 children before memory became a concern. I also enabled KeepAlives with a very short (1 second) KeepAliveTimeout so that sequential requests from the same user don’t have to recreate TCP sessions.

    Also, by serving static content, I found that WordPress was handling the 301 redirect from the Non-www version of the site to the correct url. I moved that into Apache with this directive:

       ## Rewrite to the desired domain name
        RewriteCond %{HTTP_HOST} !^www\.classperformance\.com [NC] OR
        RewriteCond %{HTTP_HOST} !^static\.classperformance\.com [NC]
        RewriteRule ^/(.*) http://www.classperformance.com/$1 [L,R=301]
    
  11. Enable server-side compression
  12. The default Apache install doesn’t compress any content. I configured mod_deflate to compress the static content and thus reduce the bandwidth usage. Compression should easily reduce the bandwidth for HTML and CSS files by one half (even up to one tenth). This not only reduces your bandwidth bill, but since the 100Mbps switch port is potentially a bottleneck, it enables more concurrent users if it approaches anywhere near that limit (and it may have if I hadn’t enabled compression)

  13. Set up some Monitoring
  14. I installed MRTG with some basic graphs. Also, I configured Apache so that I could view the ServerStatus. I also installed iftop to get a real-time view of the bandwidth usage.

With all of these changes, I’m very happy that we had tens of thousands of visitors during and shortly after the show, and everything ran perfectly. I had the static content running on a separate server for the busiest time and combined bandwidth usage peaked at around 90 Mbps shortly after the end of the show.

Don’t Use Integers as Values in an Enum Field

I just got through fixing a messy problem where a database had a table defined with a couple columns that were ENUM’s with integer values.   This leads to extreme amounts of confusion, because there is a lot of ambiguity when doing queries whether the integer is supposed to be treated as the enumerated value, or as the key.

Imagine a table with a column defined as ENUM(‘0’, ‘1’, ‘2’, ‘3’).  When doing queries, if you try to do anything with that column, it is unclear whether you mean to use the actual value you pass in, or the position.  For example, if I as to say ‘WHERE confusing_column = 2’, it could be interpreted as either meaning the value ‘2’, or the item in the second position (ie; ‘1’).    It is even hard to explain because it is so confusing.

The MySQL Documentation does a decent job of explaining it.   I agree with their recommendation:

For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing.

I ended up converting everything to Tinyint’s. It takes a few more bits per row, but worth it in my opinion to avoid the confusion.

Checking MySQL Replication

MySQL replication is pretty easy to set up, but needs a few extra things to make it more reliable. I wrote this quick PHP script to alert me when replication has failed and is more than 5 minutes behind the master.

<?php

$user = 'username';
$pass = 'password';
$host = 'localhost';
// Grant this user permission to check the status with this mysql statement
// GRANT REPLICATION CLIENT on *.* TO 'user'@'host' IDENTIFIED BY 'password';

$threshold = 300;

$db = mysql_connect($host, $user, $pass);

$result = mysql_query('SHOW SLAVE STATUS');
if (!$result) {
    // Make sure that your user has the 'REPLICATION CLIENT' privlege
    echo "Error 'SHOW SLAVE STATUS' command failed\n";
    echo mysql_error()."\n";
    exit(1);
}

$status = mysql_fetch_array($result);

if (!isset($status['Seconds_Behind_Master'])) {
    echo "Error: Seconds_Behind_Master column not found in result\n";
    print_r($status);
    exit(2);
}

if ($status['Seconds_Behind_Master'] > $threshold) {
    $minutes = floor($status['Seconds_Behind_Master'] / 60);
    echo "Error: Slave is $minutes minutes behind the master server\n";
    exit(3);
}

exit(0);
?>

This script is intended to be run periodically from cron. It doesn’t generate any output unless something is wrong. The behavior of cron is that when a script generates output, it will email the output to the user, so make sure that you have mail on your system configured to send you the cron output correctly. The script also exits with a non-zero status on each error, so you might include this in a more complicated script that attempts to do something else based on the status.

I use something like this in a non-privileged user’s crontab:

*/15 * * * /usr/bin/php /path/to/check_replication.php

bcSpamblock Updated to Version 1.3

Thanks to jontiw for pointing out a potential problem in my bcSpamblock code.  He noted the the PHP crypt() function returns the salt along with the encrypted value.  My code was passing the salt to the visitor so that an attacker could potentially learn the salt value that a website was using and create valid responses.

I modified the code to strip out that salt before passing it to the user.  I also modified the data used to create the salt so that previous vulnerable version doesn’t use the same value for the site.  The wordpress plugin has also been updated as well.

I was happy to see other people looking through my code and pointing this type of issue out.