Compression for MySQL Replication

I have a MySQL database that does a fair number of updates and inserts. The server is replicated to an off-site server located across the country. With MySQL replication, any Insert, Update, or Delete statements are written to the binary log, then sent from the master server in San Jose to the slave in New York.

I noticed today that the slave server was falling behind the master and had trouble keeping up. I noticed that there was a sizable amount of bandwidth between the two servers and after investigating for a little while, determined that the bandwidth between the servers wasn’t sufficient to keep up with the replication.

We have applications running on the server in New York that were significantly behind or slow. After a bit of research, I found the slave_compressed_protocol setting in MySQL which allows the master and slave to compress the replication data between the two servers. After enabling that, the slave was able to catch up within a matter of minutes and has stayed caught up just fine. The bandwidth usage has dropped from a consistent 600 kb/s to around 20 kb/s.

Upon looking into MySQL replication, I also experimented with SSH compression since the replication goes through an SSH Tunnel. I had similar success with SSH compression as well.

Don’t Use Integers as Values in an Enum Field

I just got through fixing a messy problem where a database had a table defined with a couple columns that were ENUM’s with integer values.   This leads to extreme amounts of confusion, because there is a lot of ambiguity when doing queries whether the integer is supposed to be treated as the enumerated value, or as the key.

Imagine a table with a column defined as ENUM(‘0’, ‘1’, ‘2’, ‘3’).  When doing queries, if you try to do anything with that column, it is unclear whether you mean to use the actual value you pass in, or the position.  For example, if I as to say ‘WHERE confusing_column = 2’, it could be interpreted as either meaning the value ‘2’, or the item in the second position (ie; ‘1’).    It is even hard to explain because it is so confusing.

The MySQL Documentation does a decent job of explaining it.   I agree with their recommendation:

For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing.

I ended up converting everything to Tinyint’s. It takes a few more bits per row, but worth it in my opinion to avoid the confusion.

SSL Certificate Notes

Whenever I create an SSL Certificate, I find myself going back and forth between several pages of notes.  I’m about to do this with a half-dozen certs that were generated on a Debian box with weak keys.  Here are the OpenSSL commands I find most useful:

Create a new key:

openssl genrsa -out MYDOMAIN.COM.key 2048

Remove the Pass Phrase from an existing key:

openssl rsa -in MYDOMAIN.COM.key.withpassword -out MYDOMAIN.COM.key

Create a Certificate Signing Request (CSR):

openssl req -new -key YOURDOMAIN.COM.key -out YOURDOMAIN.COM.csr

Inspect your CSR (or a previous one to copy values out of):

openssl req -noout -text -in MYDOMAIN.COM.csr

Self-Sign a Certificate

openssl x509 -req -days 3650 -in MYDOMAIN.COM.csr \
  -signkey MYDOMAIN.COM.key \
  -out MYDOMAIN.COM.crt

Inspect a certificate

openssl x509 -in MYDOMAIN.COM.crt -text | head -n 12

Hibernate Your Windows Machine With an Icon or a Script

This command can be used to hibernate a Windows machine from some kind of script:

rundll32 powrprof.dll,SetSuspendState

I have found this useful in a couple situations. One is where I have a computer that I like to have on during the day, but not at night. I configured the Bios to turn the computer on automatically each day at a specified time. And then I used windows task scheduler to run the hibernate command at a specified time at night.

The other situation is some problem on one of my machines where it takes literally three or four minutes to bring up the shut down box after clicking start =>shutdown. Instead of doing that, I just created a shortcut on my desktop that runs the hibernate command and it does that immediately

MyTop Stops and Beeps on When a Query Contains Binary Data

MyTop is a handy utility for watching the queries being executed on a MySQL server from a terminal window.   It is written in Perl, and is pretty straightforward.  It just does a ‘SHOW FULL PROCESSLIST’ on the database, and then displays the currently running queries.   You can sort by various columns, and in generally is just tons easier than running SHOW PROCESSLIST from the MySQL command prompt.

My database does some inserts that contain binary data.  I noticed that when running mytop, and one of those queries came up, the terminal would beep and it would stop and prompt me to enter something.

To resolve, I added this to about line 970 so that it filters out most non-displayable characters.   Feel free to let me know a better regex to use.  This one is pretty ugly, but works for now. (Also, wordpress might have mangled some of the slashes)

## Try to filter out binary information and still provide all of the necessary detail
$thread->{Info} =~ s/[^\\w\\d\\s\\(\\)\\[\\]\\-\\;\\:\\'\\"\\,\\.\\<\\>\\?\\/\\\\\\*\\~\\!\\@\\#\\$\\%\\^\\&\\*\\-_\\+\\=\\` ]//g;

Poor Performance After Enabling Repliction Due to sync_binlog

I was pretty happy with myself with setting up some fairly complicated MySQL circular replication the other night.  I did it far after peak hours so as not to disturb any visitors if it caused any problems.   Everything appeared to be working great until I started watching things the next morning.

I started to notice that the main MySQL server seemed to be running really slow.   One process that we have usually completes in a couple hours, ended up taking well over 16 hours to complete.   I spent the whole day troubleshooting it, which got me familiar with all sorts of handy tools.   ‘mytop‘ is a handy version of ‘top’ for MySQL queries.  I got familiar with iostat for watching disk I/O performance.

In the end, after a whole day of troubleshooting it came down to the ‘sync_binlog‘ setting that I had enabled because I read some howto that mentioned it was useful for the replication master.  My understanding now of the setting is that it causes the operating system to tell the disk to sync the file to disk after each write to the binary log (every UPDATE, INSERT, or DELETE).   The idea is that when the data is sync’d to disk, the drive physically writes it to the drive, instead of keeping it in a cache.    My application does a ton, of inserts, so it was killing performance.

MySQL Multi-Master (Circular) Repliclation

With a little bit of work tonight, I now have a multi-master MySQL database configuration running.  The setup is pretty slick, and surprisingly simple to set up.   I have been playing with simple Master-Slave configurations for a while now, and recently came across this OnLamp article about Circular Replication

The basic concept is pretty straightforward – each server writes its own binary log as normal.   You also enable log-slave-updates which writes updates received from the master server to its own binary log as well, so that those are passed on to the next server in the  circle.

I actually used a slightly modified version of the circle.   I have several servers at one location.  The ‘main’ master MySQL instance is on one of those, and the others all are slaves to it.   We recently added a server at a remote location and made it a simple slave as well.   All of the servers sent updates to the single master.

The problem I had was that the server at the remote location is about 70-80 ms away, which works fine for some applications, but I’m adding a feature that will potentially do 30 or so updates on each request.   Multiply 80 ms times 30 updates, and there is some noticable problems.

My new setup has updates being written locally on the remote server, and then using MySQL replication to get that back to the old master server, which then relays it to the rest of the servers in that location.  The remote server still receives updates from the main master server.  I can now update either server, and updates propagate to all servers in very near real-time.  It is very slick.

Note that this might not be suitable for all sorts of data and all situations.  My data is an extremely simple key => value caching system.  I use REPLACE INTO statements so that it doesn’t matter if a given key already exists or not.  Also, if my data becomes inconsistent, it also is not a big deal, as the caching system will just fail, and the application will proceed without using the cached results.

Previously, the major obstacle to this setup was when using automatically incrementing INSERT_ID’s and it was possible for each master server to use the same ID.   MySQL 5 introduced the auto_increment_increment and auto_increment_offset settings which makes it possible to guarantee that will never happen.   There still are some potential problems where statements not executed in the same order on all nodes may result in inconsistent data. So use with caution and make sure you understand the potential issues.

Sessions Don’t Work When Proxying Through Apache

This particular problem makes it look like your application’s sessions aren’t working at all. A common use for Apache is to serve as a reverse proxy for many applications. This is particularly common for serving dynamic Java content, and also for Ruby on Rails applications. A pretty typical configuration is to have Apache serve static content, but to have it redirect any requests for dynamic content to Tomcat. A sample Apache configuration might look like this:

RewriteEngine On
RewriteRule ^/(.+\.jsp)$ ajp://localhost:8009/myapp/$1 [P]
ProxyPassReverse / ajp://localhost:8009/myapp/

When Apache serves as a reverse proxy, it just passes requests directly to the backend server, and returns the results directly as received. In the case of Java applications, they typically are installed in an application directory, and specify that directory in the SetCookie header. Here is a sample SetCookie header from an HTTP response:

Set-Cookie: JSESSIONID=E1576192767FB8D998137B52461C023D; Path=/myapp

With the default behavior, Apache passes that Set-Cookie header un-modified to the client. It receives the cookie, but will only send the cookie for requests in the /myapp directory. The solution is a new configuration parameter for ProxyPassReverseCookiePath introduced in Apache 2.2 which tells Apache to rewrite the Path parameter according to the rules that you define. To use it, simple add this line in your Apache config:

ProxyPassReverseCookiePath  /myapp  /

This tells apache to replace the ‘Path=/myapp’ in the Set-Cookie header with ‘Path=/’. That should tell your browser about the application’s path correctly, and let your sessions work correctly

A Case for Choosing Good Server Names

This morning, I had a client call me bright and early, frantic about some mail problems they were having.  All of their mail servers had stopped accepting incoming SMTP connections for some reason, and they couldn’t figure out why.

After a little bit of investigation, I found that they were using postfix with MySQL-based virtual domains.   The MySQL authentication was failing, which meant that postfix was unable to look up any valid recipient names.   That, in turn was causing tons of retried connections, until they hit the maximum number of connections where Postfix would refuse additional connections.

The problem is that these mail servers were initially set up with some dumb names for some reason.    A new administrator noticed the silly names in their Reverse DNS entries and changed them to some more sensible names.  The MySQL permissions were based off of the hostnames, so when the names in Reverse DNS changed, it broke the permissions, and the clients were unable to connect.

Solving the problem was simple enough – I just corrected the MySQL permissions, and then had to deal with some huge mail queues for a little while as all of the messages waiting to come in were finally allowed all at once.

The moral of the story is to use sensible names to start out with.   These names were chosen to be sortof funny I guess, but it didn’t end up being so amusing in the midst of all of the problems it caused.  As a side note, I usually do MySQL permissions based on IP Address as well, so that you further reduce this kind of problem.