MySQL Statistics for Updates/Inserts per-table

Posted on September 16th, 2016 in General by Brandon

For a long time, I’ve never been able to answer some basic questions that I thought fundamental to optimizing server performance. MySQL gives you some server-wide metrics about activity, but none of it is broken down per-table so that an application developer could look into where to reduce the number of writes, or generally where to focus their attention in order to improve the server performance.

I finally got ambitious enough to tackle this problem and asked a question on StackOverflow at http://stackoverflow.com/questions/39459185/mysql-how-to-count-the-number-of-inserts-updates-to-a-table

A commenter named barat pointed me to this post which had the insightful idea of parsing the binary log for analysis.
Since my servers are generally hosted on AWS, I don’t have direct access to the binary log, so I had to investigate how to retrieve those. The MySQL documentation for the mysqlbinlog command briefly mentions how to read the binary log from a remote server. It took some experimentation to get the right command and output options with all of the data I wanted. Specifically, the `–base64-output=DECODE-ROWS –verbose` options which translate some of the row-based logging into MySQL commands that can be parsed.

The first step is to create a user that has access to the binary logs. I used the main ‘admin’ user that RDS creates because it was convenient. If creating a new user, you probably need to grant the REPLICATION_SLAVE privilege.

You can see which binary logs are available on the server with the SHOW BINARY LOGS; command:

mysql> show binary logs;
+----------------------------+-----------+
| Log_name                   | File_size |
+----------------------------+-----------+
| mysql-bin-changelog.232522 | 16943219  |
| mysql-bin-changelog.232523 | 32300889  |
| mysql-bin-changelog.232524 | 15470603  |
+----------------------------+-----------+

Then you can actually retrieve the log and print to STDOUT using this command:

14:01 $ mysqlbinlog --read-from-remote-server \
  --host myhost.somerandomchars.us-east-1.rds.amazonaws.com \
  --user admin \
  --password="mypassword"
  mysql-bin-changelog.232522

Note that if you get the error below, you need to make sure that your MySQL client and server tools are using the same version. I originally attempted to use MySQL 5.5 tools with a MySQL 5.6 server.

ERROR: Got error reading packet from server: Slave can not handle replication events with
the checksum that master is configured to log; the first event 'mysql-bin-changelog.232519'
at 4, the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.232519' at 120,
the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.232519' at 120.

After that, it was just a matter of parsing the file for the relevant commands. I’ve put all of that logic now into a quick PHP script that I can reuse anywhere. Now, I can go through a bunch of binary logs on a server and see which tables are updated the most frequently with output like this:

Parsed 1,096,063 lines spanning 300 seconds between 2016-09-13 03:05:00 and 2016-09-13 03:10:00
master                         metrics                        update          = 43570
master                         metrics                        insert into     = 9
DEFAULT                        accounts                       update          = 501
DEFAULT                        users                          update          = 5
DEFAULT                        logins                         insert into     = 1
mysql                          rds_heartbeat2                 insert into     = 1

I’ve committed this project to Github at https://github.com/sellerlabs/mysql-writes-per-table for others to use.