For a long time, I’ve never been able to answer some basic questions that I thought fundamental to optimizing server performance. MySQL gives you some server-wide metrics about activity, but none of it is broken down per-table so that an application developer could look into where to reduce the number of writes, or generally where to focus their attention in order to improve the server performance.
I finally got ambitious enough to tackle this problem and asked a question on StackOverflow at http://stackoverflow.com/questions/39459185/mysql-how-to-count-the-number-of-inserts-updates-to-a-table
A commenter named barat pointed me to this post which had the insightful idea of parsing the binary log for analysis.
Since my servers are generally hosted on AWS, I don’t have direct access to the binary log, so I had to retrieve those. The MySQL documentation for the mysqlbinlog command briefly mentions how to read the binary log from a remote server. It took some experimentation to get the right command and output options with all of the data I wanted. Specifically, the `–base64-output=DECODE-ROWS –verbose` options which translate some of the row-based logging into MySQL commands that can be parsed.
The first step is to create a user that has access to the binary logs. I used the main ‘admin’ user that RDS creates because it was convenient. If creating a new user, you probably need to grant the REPLICATION_SLAVE privilege.
You can see which binary logs are available on the server with the SHOW BINARY LOGS; command:
mysql> show binary logs; +----------------------------+-----------+ | Log_name | File_size | +----------------------------+-----------+ | mysql-bin-changelog.232522 | 16943219 | | mysql-bin-changelog.232523 | 32300889 | | mysql-bin-changelog.232524 | 15470603 | +----------------------------+-----------+
Then you can actually retrieve the log and print to STDOUT using this command:
14:01 $ mysqlbinlog --read-from-remote-server \ --host myhost.somerandomchars.us-east-1.rds.amazonaws.com \ --user admin \ --password="mypassword" mysql-bin-changelog.232522
Note that if you get the error below, you need to make sure that your MySQL client and server tools are using the same version. I originally attempted to use MySQL 5.5 tools with a MySQL 5.6 server.
ERROR: Got error reading packet from server: Slave can not handle replication events with the checksum that master is configured to log; the first event 'mysql-bin-changelog.232519' at 4, the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.232519' at 120, the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.232519' at 120.
After that, it was just a matter of parsing the file for the relevant commands. I’ve put all of that logic now into a quick PHP script that I can reuse anywhere. Now, I can go through a bunch of binary logs on a server and see which tables are updated the most frequently with output like this:
Parsed 1,096,063 lines spanning 300 seconds between 2016-09-13 03:05:00 and 2016-09-13 03:10:00 master metrics update = 43570 master metrics insert into = 9 DEFAULT accounts update = 501 DEFAULT users update = 5 DEFAULT logins insert into = 1 mysql rds_heartbeat2 insert into = 1
I’ve committed this project to Github at https://github.com/sellerlabs/mysql-writes-per-table for others to use.