When you use auto-tagging with your Adwords campaign, all request that are generated by Google Adwords contain a ?glcid parameter in the Request. Adwords uses this to pass some information to Analytics for traffic analysis.
I was curious, about what data the gclid parameter contained. My guess was that it contained some encoded or encrypted information regarding the origin of the click, so I did some analysis on the clicks that I received. Some discussion about it was available on this post.
I ended up writing a quick PHP script that parses through an Apache log file. It finds requests that contain a gclid and then produces a report of which letters occur in which positions of the gclid.
The script is available for download here, and it generates a report like this:
Found 32507 appropriate lines Character 1 [ 1] C Character 2 [ 8] IJKLMNOP Character 3 [32] -CDGHKLOPSTWX_abefijmnqruvyz2367 Character 4 [64] -CDEFG0ABHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789 Character 5 [32] -_0ghijklmnopqrstuvwxyz123456789 Character 6 [32] -IJKLMNOPYZ_abcdefopqrstuv456789 Character 7 [32] -CDGHKLOPSTWX_abefijmnqruvyz2367 Character 8 [64] -ABCDEFG0HIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789 Character 9 [32] 0-_ghijklmnopqrstuvwxyz123456789 Character 10 [ 4] JZp5 Character 11 [ 8] IMQUYcgk Character 12 [ 1] C Character 13 [ 1] F Character 14 [10] QRSUWYZcde Character 15 [61] -ABCEFGHIJKLMNOPQRSTUVWXYZ_ab0cdefghiklmnopqrstuvwxy123456789 Character 16 [63] -ABCDEFGHIJKLMNOQRSTUVWXYZ_abcde0fghijklmnopqrstuvwxyz123456789 Character 17 [17] DFGHIQabgiknrsx57 Character 18 [ 4] AQgw Character 19 [ 1] o Character 20 [ 1] d Character 21 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwx0yz123456789 Character 22 [32] ABCDEFGHQRSTUVWXghijklmnwyz0x123 Character 23 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuv0wxyz123456789 Character 24 [64] -ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrs0tuvwxyz123456789 Character 25 [62] 0-ABCDEFHIJKLMOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz123456789 Character 26 [ 4] AQgw
This makes it clear that the parameter has some structure, but I’m still no closer to determining what it contains. Counting up the unique values, it would seem that they have about 95 bits of information available, which might be enough room to store everything it would need to know about the search that created it. Based on the reporting details in Analytics, I would presume that it somehow contains at least the following information:
- Campaign (id)
- Keyword (id)
- Ad Variation (id)
- Position
I did some research by clicking an ad multiple times and examining the glcids for those:
12345678901234567890123456 /?gclid=CNHz5eD_8pkCFRCdnAodzniYQg /?gclid=CIX_u-X_8pkCFQKenAodlWprSg /?gclid=CMyI_4OA85kCFRIhnAodc2_oRg /?gclid=CO_0pYyA85kCFQghnAodDDpaRQ /?gclid=CIXo9JeA85kCFRIhnAodc2_oRg /?gclid=CLitgp2A85kCFQubnAod1nx7Qg /?gclid=CN3_1aOA85kCFQghnAodDDpaRQ /?gclid=CPyi1quA85kCFRabnAodWnZbRQ /?gclid=COq-67OA85kCFRMhnAodyQvSRg /?gclid=COOplrmA85kCFRCdnAodzniYQg
I noticed that most of the characters which use 32-64 characters vary quite a bit except for character #9, which was always an 8, and character #10 which was a ‘p’ for the first two clicks, and then a ‘5’ for all subsequent clicks. That likely has some significance, but I’m out of time for playing with it for now.
Hopefully the script and this basic analysis might be of use for somebody else to use in digging into it further.
One other thought that I had is that the data (or each field) is somehow encrypted and when you ‘link’ your Analytics account to your Adwords account it shares the decryption key so that it can get at the detail.
A 5 in the 9th column probably means “Click Fraud detected” 🙂
A very long article about it:
http://blog.merjis.com/2007/07/16/click-fraud-google-adwords-and-gclid/
“The “(stuff)†that is added appears to be unique for each advert impression, and appears to be unique in a clever way… The first part of the ID varies rapidly and the last part varies slowly. This is clever because when you are looking for string matches, you get an early failure in the string match, helping to speed the search up – an indication that some smart people may have been working on this.”
“I’ll guess that the last part of the gclid value encodes, or more likely references in some way, the advertiser ID, the keyword, adgroup, campaign and account ID’s. The first part, that changes rapidly, is probably some combination of timestamp and instance ID or advertising channel (where the advert was published). I suspect that the account and keyword part is a database ID that delivers a row with the account ID, campaign and so on – rather than being an encoding. I suspect that the first part is a timestamp and instance ID, which will also be recorded on Google servers and will tell them when the advert impression was delivered, on which site and how long it was between that impression and the click.”
And, then there’s Matt Cutts blog with some good links:
http://www.mattcutts.com/blog/better-click-tracking-with-auto-tagging/
Yeah, I read all of those and then found that I had wasted a couple hours without really accomplishing anything. I’m pretty certain that there is some interesting information contained in there, but since this is their main method of generating revenue, it is likely very well thought out and well enough encoded that I will never be able to extract any useful information (although that click fraud detector idea might be useful if it pans out).