Array versus String in CURLOPT_POSTFIELDS

The PHP Curl Documentation for CURLOPT_POSTFIELDS makes this note:

This can either be passed as a urlencoded string like ‘para1=val1&para2=val2&…’ or as an array with the field name as key and field data as value. If value is an array, the Content-Type header will be set to multipart/form-data.

I’ve always discounted the importance of that, and in most cases it doesn’t generally matter. The destination server and application likely know how to deal with both multipart/form-data and application/x-www-form-urlencoded equally well. However, the data is passed in a much different way using these two different mechanisms.

application/x-www-form-urlencoded

application/x-www-form-urlencoded is what I generally think of when doing POST requests. It is the default when you submit most forms on the web. It works by appending a blank line and then your urlencoded data to the end of the POST request. It also sets the Content-Length header to the length of your data. A request submitted with application/x-www-form-urlencoded looks like this (somewhat simplified):

POST /some-form.php HTTP/1.1
Host: www.brandonchecketts.com
Content-Length: 23
Content-Type: application/x-www-form-urlencoded

name=value&name2=value2

multipart/form-data

multipart/form-data is much more complicated, but more flexible. Its flexibility is required when uploading files. It works in a manner similar to MIME types. The HTTP Request looks like this (simpified):

POST / HTTP/1.1
Host: www.brandonchecketts.com
Content-Length: 244
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------26bea3301273

And then subsequent packets are sent containing the actual data. In my simple case with two name/value pairs, it looks like this:

HTTP/1.1 100 Continue
------------------------------26bea3301273
Content-Disposition: form-data; name="name"

value
------------------------------26bea3301273
Content-Disposition: form-data; name="name2"

value2
------------------------------26bea3301273--

CURL usage

So, when sending POST requests in PHP/cURL, it is important to urlencode it as a string first.

This will generate the multipart/form-data version

$data = array('name' => 'value', 'name2' => 'value2');
curl_setopt($curl_object, CURLOPT_POSTFIELDS,  $data)

And this simple change will ensure that it uses the application/x-www-form-urlencoded version:

$data = array('name' => 'value', 'name2' => 'value2');
$encoded = '';
foreach($data as $name => $value){
    $encoded .= urlencode($name).'='.urlencode($value).'&';
}
// chop off the last ampersand
$encoded = substr($encoded, 0, strlen($encoded)-1);
curl_setopt($curl_object, CURLOPT_POSTFIELDS,  $encoded)

KnitMeter is now a Facebook App

KnitMeter.com is a site that I wrote quickly for my wife to keep track of how much she has knit. It generate a little ‘widget’ image that can be placed on blogs, forums, etc and says how many miles of yarn you have knit in some period. The site has been live for about a year and a half now and has a couple thousand registered users.

I have been receiving an increasing number of requests to add a method for adding a KnitMeter it to Facebook. I’ve experimented with a couple of other ideas on Facebook and found that it was pretty straightforward to write an app. KnitMeter seems like a decent candidate for a social app, so I started working on it about a week ago. I’m happy to say that I just made the application live late last night. It is available at http://apps.facebook.com/knitmeter/. If you use other social media apps, then go here where you can buy YouTube subscribers or views, or even Instagram followers.

Features include:

  • Ability to add projects and add knitted lengths to a project (or not)
  • Settings for inputting lengths in feet, yards, or meters
  • Display how much you’ve knit in feet, yards, meters, kilometers, or miles
  • When entering a new length, you can choose to have it publish a ‘story’ on your profile page
  • You can add a tab on your profile page that shows each of your projects as well as a total
  • You can add a KnitMeter ‘box’ to the side of your profile page, or on your ‘boxes’ tab.

I recreated the database from scratch and defined it a little better, so I have a little bit of work to do in migrating the existing site and database over to the new structure. Once that is done users will be able to import their data from the existing KnitMeter.com by providing their email/password.

Synchronize Remote Memcached Clusters with memcache_sync

The problem: Servers in two separate geographic locations each have their own memcached cluster. However, there doesn’t currently exist (that I know of) a good way to copy data from one cluster to the other cluster.

One possible solution is to configure the application to perform all write operations in both places. However, each operation requires a round-trip response. If the servers are separated by 50ms or more, doing several write operations causes a noticable delay.

The solution that I’ve come up with is a perl program that I’m calling memcache_sync. It acts a bit like a proxy that asynchronously performs write operations on a remote cluster. Each geographic location runs an instance of memcache_sync that emulates a memcached server. You configure your application to write to the local memcache cluster, and also to the memcache_sync instance. memcache_sync queues the request and immediately returns a SUCCESS message so that your application can continue doing its thing. A separate thread then writes those queued operations to the remote cluster.

The result is two memcache clusters that are synchronized in near-real time, without any noticable delay in the application.

I’ve implemented ‘set’ and ‘delete’ operations thus far, since that is all that my application uses. I’ve just started using this on a production environment and am watching to see how it holds up. So far, it is behaving well.

The script is available here. I’m interested to see how much need there is for such a program. I’d be happy to have input from others and in developing this into a more robust solution that works outside of my somewhat limited environment.

What is in a gclid?

When you use auto-tagging with your Adwords campaign, all request that are generated by Google Adwords contain a ?glcid parameter in the Request. Adwords uses this to pass some information to Analytics for traffic analysis.

I was curious, about what data the gclid parameter contained. My guess was that it contained some encoded or encrypted information regarding the origin of the click, so I did some analysis on the clicks that I received. Some discussion about it was available on this post.

I ended up writing a quick PHP script that parses through an Apache log file. It finds requests that contain a gclid and then produces a report of which letters occur in which positions of the gclid.

The script is available for download here, and it generates a report like this:

Found 32507 appropriate lines
Character  1 [ 1] C
Character  2 [ 8] IJKLMNOP
Character  3 [32] -CDGHKLOPSTWX_abefijmnqruvyz2367
Character  4 [64] -CDEFG0ABHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789
Character  5 [32] -_0ghijklmnopqrstuvwxyz123456789
Character  6 [32] -IJKLMNOPYZ_abcdefopqrstuv456789
Character  7 [32] -CDGHKLOPSTWX_abefijmnqruvyz2367
Character  8 [64] -ABCDEFG0HIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789
Character  9 [32] 0-_ghijklmnopqrstuvwxyz123456789
Character 10 [ 4] JZp5
Character 11 [ 8] IMQUYcgk
Character 12 [ 1] C
Character 13 [ 1] F
Character 14 [10] QRSUWYZcde
Character 15 [61] -ABCEFGHIJKLMNOPQRSTUVWXYZ_ab0cdefghiklmnopqrstuvwxy123456789
Character 16 [63] -ABCDEFGHIJKLMNOQRSTUVWXYZ_abcde0fghijklmnopqrstuvwxyz123456789
Character 17 [17] DFGHIQabgiknrsx57
Character 18 [ 4] AQgw
Character 19 [ 1] o
Character 20 [ 1] d
Character 21 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwx0yz123456789
Character 22 [32] ABCDEFGHQRSTUVWXghijklmnwyz0x123
Character 23 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuv0wxyz123456789
Character 24 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrs0tuvwxyz123456789
Character 25 [62] 0-ABCDEFHIJKLMOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789
Character 26 [ 4] AQgw

This makes it clear that the parameter has some structure, but I’m still no closer to determining what it contains. Counting up the unique values, it would seem that they have about 95 bits of information available, which might be enough room to store everything it would need to know about the search that created it. Based on the reporting details in Analytics, I would presume that it somehow contains at least the following information:

  • Campaign (id)
  • Keyword (id)
  • Ad Variation (id)
  • Position

I did some research by clicking an ad multiple times and examining the glcids for those:

        12345678901234567890123456
/?gclid=CNHz5eD_8pkCFRCdnAodzniYQg
/?gclid=CIX_u-X_8pkCFQKenAodlWprSg
/?gclid=CMyI_4OA85kCFRIhnAodc2_oRg
/?gclid=CO_0pYyA85kCFQghnAodDDpaRQ
/?gclid=CIXo9JeA85kCFRIhnAodc2_oRg
/?gclid=CLitgp2A85kCFQubnAod1nx7Qg
/?gclid=CN3_1aOA85kCFQghnAodDDpaRQ
/?gclid=CPyi1quA85kCFRabnAodWnZbRQ 
/?gclid=COq-67OA85kCFRMhnAodyQvSRg
/?gclid=COOplrmA85kCFRCdnAodzniYQg

I noticed that most of the characters which use 32-64 characters vary quite a bit except for character #9, which was always an 8, and character #10 which was a ‘p’ for the first two clicks, and then a ‘5’ for all subsequent clicks. That likely has some significance, but I’m out of time for playing with it for now.

Hopefully the script and this basic analysis might be of use for somebody else to use in digging into it further.

One other thought that I had is that the data (or each field) is somehow encrypted and when you ‘link’ your Analytics account to your Adwords account it shares the decryption key so that it can get at the detail.

Announcing WebPasswd

Do you have users who need access to web-based applications on multiple servers? Managing those users can be a pain when dealing with normal htpasswd-based permissions. Adding or removing users means editing each htpasswd file and remembering where all of them are.

Mod_auth_mysql is a good way to centralize that user database so that you can avoid having all of the separate htpasswd files. The apache module is available from any modern Linux distribution, so installing and configuring it takes less than 5 minutes. I started using it almost 2 years ago, and over that time have made a simple web application for managing the users and granting them permission to each application.

I’ve released the program as WebPasswd for anybody else who wants to use it. Now adding users and granting them access to application can be don with just a few clicks. Granting and revoking access to an application takes just seconds and is applied immediately. Configuring a new application takes a couple clicks, and then you just copy/paste the Apache configuration into the appropriate place on your web server. Try it out with this demo.

I think this will be useful to people. I have not seen another application that does something similar. Let me know if it works for you.

Save Internet Audio Streams to MP3s

I’ve got a couple of radio programs that I like to listen to. The only problem is that I rarely am able to listen to them live. I was wishing that somebody made a good DVR-like device for the radio, but after some thought figured out a way to do it on my own using internet audio streams that most radio stations now have available.

Googling for instructions on how to save Internet audio streams will return a lot of semi-workable but mostly garbage instructions. The best set of instructions I found was at Instructables.com where the basic concept is to use the command-line mplayer to save the stream as a wave file, then use lame to convert it to an MP3.

The instructables tutorial had several downfalls though. First, it is not able to stop mplayer on its own, so it uses a second cron job to kill the original mplayer command – a little to crude for my taste. Secondly, and more importantly, you have to know the exact stream URL which is not easy to identify from most internet radio websites. They tend to hide the actual stream source behind layers of javascript so that their web-based players can synchronize ads and such while listening to the streams.

I created some PHP code that automates this process and makes it pretty simple. The basic streamsave class provides functions for downloading the wave file and converting it to an MP3. I then extend that class for specific radio stations that I want to save. The extended class provides functions that run through all of the javascript garbage to get to the actual stream source.

Using those classes, this simple script now saves my stream to an MP3 file and emails me the location when it is done:

<?php
require_once dirname(__FILE__).'/ss_640wgst.class.php';

$streamsave = new ss_640wgst();

$streamsave->stream_url = $streamsave ->getStreamURL();
$streamsave->seconds = 60 * 60; // One hour

// This saves the stream to a temporary wav file
$streamsave->save_stream();

// Now encode it to an mp3
$output_file = "/tmp/some_directory/some_program_".date('Y-m-d-His').'.mp3';
$streamsave->encode_to_mp3($output_file);

// Delete the large wav file
unlink($streamsave->wavfile);

// And tell me that the file was saved
echo "File saved (if all went okay) to {$streamsave->mp3file}\n\n";
mail('you@yourdomain.com', 'Audio File Saved', "File saved to {$streamsave->mp3file}");

// It would be cool to create a podcast XML file here that contains your new file

?>

Downloads

The abstract class file: streamsave.class.php
The extended class specifically for 640 WGST in Atlanta: ss_640wgst.class.php

I’ve created the extended classes for stations that are useful for me. If there seems to be any interest, I can work on developing that a bit more to make it more generalized and work for more radio stations.

‘Maintenance’ Pages via Apache mod_rewrite

Occasionally, I’ve found it useful to put up a maintenance page while performing some work on a website. It may be useful if you are debugging and want to ensure that regular visitors don’t see any application generated error messages or blank pages or anything.

This method uses mod_rewrite to redirect all requests to a maintenance page that you create. Since

First create maint.html with some message that you want to display to your users. Then add this to your Apache configuration to redirect users to that page. Obviously, you’ll need to substitute your own IP address. You can add multiple lines to include multiple users if necessary. The configuration essentially says requests not from your IP (notice the exclamation point) will be redirected to /maint.html and that is the last Rewrite rule that should be followed.

  ##### Maintenance section
  ## Uncomment and add your IP address for performing maintenance
  ## Add multiple addresses on multiple lines if necessary
  RewriteCond %{REMOTE_ADDR} !^11\.22\.33\.44$
  RewriteCond %{REMOTE_ADDR} !^1\.1\.1\.1$
  RewriteRule . /maint.html [L]
  ##### End Maintenance section

Don’t Use Integers as Values in an Enum Field

I just got through fixing a messy problem where a database had a table defined with a couple columns that were ENUM’s with integer values.   This leads to extreme amounts of confusion, because there is a lot of ambiguity when doing queries whether the integer is supposed to be treated as the enumerated value, or as the key.

Imagine a table with a column defined as ENUM(‘0’, ‘1’, ‘2’, ‘3’).  When doing queries, if you try to do anything with that column, it is unclear whether you mean to use the actual value you pass in, or the position.  For example, if I as to say ‘WHERE confusing_column = 2’, it could be interpreted as either meaning the value ‘2’, or the item in the second position (ie; ‘1’).    It is even hard to explain because it is so confusing.

The MySQL Documentation does a decent job of explaining it.   I agree with their recommendation:

For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing.

I ended up converting everything to Tinyint’s. It takes a few more bits per row, but worth it in my opinion to avoid the confusion.

MyTop Stops and Beeps on When a Query Contains Binary Data

MyTop is a handy utility for watching the queries being executed on a MySQL server from a terminal window.   It is written in Perl, and is pretty straightforward.  It just does a ‘SHOW FULL PROCESSLIST’ on the database, and then displays the currently running queries.   You can sort by various columns, and in generally is just tons easier than running SHOW PROCESSLIST from the MySQL command prompt.

My database does some inserts that contain binary data.  I noticed that when running mytop, and one of those queries came up, the terminal would beep and it would stop and prompt me to enter something.

To resolve, I added this to about line 970 so that it filters out most non-displayable characters.   Feel free to let me know a better regex to use.  This one is pretty ugly, but works for now. (Also, wordpress might have mangled some of the slashes)

## Try to filter out binary information and still provide all of the necessary detail
$thread->{Info} =~ s/[^\\w\\d\\s\\(\\)\\[\\]\\-\\;\\:\\'\\"\\,\\.\\<\\>\\?\\/\\\\\\*\\~\\!\\@\\#\\$\\%\\^\\&\\*\\-_\\+\\=\\` ]//g;

Poor Performance After Enabling Repliction Due to sync_binlog

I was pretty happy with myself with setting up some fairly complicated MySQL circular replication the other night.  I did it far after peak hours so as not to disturb any visitors if it caused any problems.   Everything appeared to be working great until I started watching things the next morning.

I started to notice that the main MySQL server seemed to be running really slow.   One process that we have usually completes in a couple hours, ended up taking well over 16 hours to complete.   I spent the whole day troubleshooting it, which got me familiar with all sorts of handy tools.   ‘mytop‘ is a handy version of ‘top’ for MySQL queries.  I got familiar with iostat for watching disk I/O performance.

In the end, after a whole day of troubleshooting it came down to the ‘sync_binlog‘ setting that I had enabled because I read some howto that mentioned it was useful for the replication master.  My understanding now of the setting is that it causes the operating system to tell the disk to sync the file to disk after each write to the binary log (every UPDATE, INSERT, or DELETE).   The idea is that when the data is sync’d to disk, the drive physically writes it to the drive, instead of keeping it in a cache.    My application does a ton, of inserts, so it was killing performance.