Migrating 1.2 TB Database From Aurora to MySQL

We have one database server that is running on an old version of Aurora based on MySQL 5.6. AWS is deprecating that version soon and it needs to be upgraded, so I have been working on replacing it. Upgrading the existing 5.6 server to 5.7, then to 8.0 isn’t an option due to an impossibly huge InnoDB transaction history list that will never fix itself. Plus, I want to improve a couple of other things along the way.

I made several attempts and migrating from Aurora 5.6 to Aurora 8.0, but during that process, I grew tired of Aurora quirks and costs. Here are some of my raw notes on what was an embarrassingly long migration of a database server from Aurora to MySQL. Going from MySQL to Aurora took just a couple of clicks. But converting from Aurora back to MySQL took months and a lot of headaches.

TLDR: Along the way, I tried Using Amazon’s Database Migration Service, but eventually gave up for a good old closely monitored mysqldump and custom scripts.

I had a few goals/requirements:

  • Get rid of or soon-to-be-deprecated Aurora instance based on MySQL 5.6
  • Stop Paying for Storage IOPS (often over $100/day)
  • Convert tables from utf8mb3 to utf8mb4
  • Minimal downtime or customer disruption. Some disruption during low-usage times is okay.

A new MySQL 8 instance with a GP3 storage volume and the recently announced RDS Optimized Writes means that MySQL should be able to handle the workload with no problem, and gets this server back into the MySQL realm, where all of our other servers are, and with which we are more comfortable.

Attempts at using AWS Database Migration Service (DMS)

This service looked promising, but has a learning curve. I eventually gave up using it because of repeated problems that would have taken too much effort to try and resolve.

First attempts:
On the surface, it seems like you configure a source, configure a destination, and then tell DMS to sync one to the other and keep them in sync. It does this in two Phases: the Full Dump, and the Change Data Capture (CDC). I learned the hard way that the Full Dump doesn’t include any indexes on the tables! This is done to make it as fast as possible. The second, CDC Phase, just executes statements from the binary log, so without indexes on a 400+G table, they take forever and this will never work.

I also concluded that one of our 300+GB tables can actually be done in a separate process, after the rest of the data is loaded. It contains historic information that will make some things in the application look incomplete until it is loaded, but the application will work with it empty.

Second attempts:
Used DMS for the full dump, the configured it to stop after the full dump, before starting the CDC Process. While it is stopped, I added the database indexes and foreign keys. I tried this several times with varying degrees of success and trying to minimize the amount of time that it took to add the indexes. Some tables were done instantly, some took a couple hours, and some were 12+ hours. At one point I had figured it would take about 62 hours to add the indexes. I think I got that down to 39 hours by increasing the IOPS, running some ALTER TABLES in parallel, etc.

After indexes were added, I started the second phase of DMS – the Change Data Capture is supposed to pick up in time where the Full Dump was taken, and then apply all of the changes from the Binary Logs to the new server. That process didn’t go smoothly. Again, the first attempts looked promising, but then the binary logs on the server were deleted, so it couldn’t continue. I increased the number of days that binary logs were kept, and made more attempts, but they had problems with foreign key and unique constraints on tables.

The biggest problem with these attempts was that it took about 24 hours for the data migration, and about 48 hours to add indexes. So each attempt was several days effort.

Third and last attempts at using DMS:
After getting pretty familiar DMS, I ended up creating the schema via `mysqldump –no-data` then manually editing the file to exclude indexes on some of the biggest tables that would cause the import to go slow. I excluded the one large, historic table. My overall process looked like this:

  • code>mysqldump –defaults-group-suffix=dumpschema –no-data thedatabase |sed “s/utf8 /utf8mb4 /” | sed “s/utf8_/utf8mb4_/” > /tmp/schema-limited-indexes.sql
  • Edit /tmp/schema-limited-indexes.sql and remove foreign keys and indexes on large tables
  • cat /tmp/schema-limited-indexes.sql | mysql –defaults-group-suffix=newserver thedatabase
  • On the new server, run ALTER TABLE the_historic_table ENGINE=blackhole;
  • Start DMS process, make sure to have it stop between Full Load and CDC.
  • Wait ~24+ hours for Full load to complete
  • Add Indexes back that were removed from the schema. I had a list of ALTER TABLE statements to run, with an estimate time that each should take. That was estimated at 39 hours
  • Start second Phase (CDC) of the DMS Task
  • Wait for CDC to complete (time estimate unknown. The faster the above steps worked, the less it had to replay)

Unfortunately, a couple of attempts at this had the CDC phase still fail with Foreign key constraints. I tried several times and don’t know why this happened. Finding the offending rows took many hours since the queries didn’t have indexes and had to do full table scans. In some cases, there were just a few, to a few-dozen rows that existed in one table without the corresponding row in the foreign table. Its as if the binary log position taken when the snapshot was started was off by a few seconds and the dumps of different tables were started at slightly different positions.

After several attempts (taking a couple weeks), I finally gave up on the DMS approach.

Using MySQL Dump

Using mysqldump to move data from one database server to another is a process I have done thousands of times and written many scripts around. It is pretty well understood and predictable. I did a few trial runs to put together this process:

Temporarily Stop all processes on the master server

  • Stop all background processes that write to the server
  • Change the password so that no processes can write to the master
  • Execute SHOW BINARY LOGS on master and note the last binary log file and position. Do this a few times to make sure that it does not change. (Note that this would be easier if RDS allowed FLUSH TABLES WITH READ LOCK, but since it doesn’t, this process should work.

Dump the schema to the new server

This has the sed commands in the middle to convert the old “utf8” colations to the desired “utf8mb4” versions. When dumping 1TB+ of data, I found it helped performance a bit to do the schema changes with the sed commands first. That way the bulk of the data doesn’t have to go through these two commands.

  • mysqldump --defaults-group-suffix=dumpschema --no-data thedatabase |sed "s/utf8 /utf8mb4 /" | sed "s/utf8_/utf8mb4_/" | mysql thedatabase
  • .my.cnf contains this section with the relevant parameters for the dump
    [clientdumpschema]
    host=thehostname.cluster-czizrrfoedlm.us-east-1.rds.amazonaws.com
    port=3306
    user=dumper
    password=thepassword
    ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA
    quick
    compress
    set-gtid-purged=OFF
    max_allowed_packet=1024M
    single-transaction=TRUE
    column_statistics=0
    net_buffer_length=256k
    

Move the data

To move the data, I ran this command. Note that it starts with time so that I could see how long it takes. Also, it includes

time mysqldump --defaults-group-suffix=dumpdata --no-create-info thedatabase | pv |mysql thedatabase

My .my.cnf contains this section for the import

host=thehostname.cluster-czizrrfoedlm.us-east-1.rds.amazonaws.com
port=3306
user=dumper
password=thepassword
ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA
quick
ignore-table=thedatabase.the_big_table
compress
set-gtid-purged=OFF
max_allowed_packet=1024M
single-transaction=TRUE
column_statistics=0
net_buffer_length=256k

Note that the above command includes the linux pv in between which is a nice way to monitor the progress. It displays a simple line to stderr that allows you to see the total transfer size, elapsed time, and current speed.

266.5GiB 57:16:47 [ 100KiB/s] [             <=>         ]

I experimented with several values for the NET_BUFFER_LENGTH parameter by dumping the same multi-GB table over and over with different values for NET_BUFFER_LENGTH. The size of this value determines how many values are included in the INSERT INTO statement generated by mysqldump. I was hoping that a larger value would improve performance, but I found that larger values slowed down. I found the best value was to use 256k.

NET_BUFFER_LENGTH value Elapsed Time
64k 13m 44s
256k 8m 27s
256k 7m 20s
1M 10m 23s
16M 11m 32s

After Migration is Started

After the mysqldump has been started, I re-enabled traffic back to the master server by setting the password back to the original. I kept all background jobs disabled to minimize the amount of data that had to be copied over afterwards.

Final attempt to use DMS

After the mysqldump was finished, I attempted to use the DMS Change Data Capture process to copy over the data that had changed on the master. You can start a Database Migration Task that begins at a specific point in the Master Log position. Maybe. I tried, it, but it failed pretty quickly with a duplicate key constraint. I gave up on DMS and figured I would just move over any data needed manually via custom scripts.

Other findings

In attempting to maximimize the speed of the transfer, I attempted to increase the IOPS on the GP3 volume from its base level of 12,000 to 32,000. Initially that helped, but for some reason I still don’t understand, the throughput was then limited very strictly to 6,000 IOPS. As seen in the chart below, it bursted above that for some short parts, but it was pretty strictly constrained for most of the time. I think this has to do with how RDS uses multiple volumes to store the data. I suspect that each volume has 6,000 capacity, and all of my data was going to a single volume.

RDS IOPS
RDS IOPS Maxed at 6,000

That concludes the notes that I wanted to take. Hopefully somebody else finds these learnings or settings useful. If this has been helpful, or if you have any comments on some of the problems that I experienced, please let me know in the comments below.

Find MySQL indexes that can be removed to free up disk space and improve performance

I wrote this handy query to find indexes that can be deleted because they have not been in use. It
queries the performance_schema database for usage on the indexes, and joins on INFORMATION_SCHEMA.TABLES
to see the index size.

Indexes that have zero reads and writes are obvious candidates for removal. They take extra write overhead to keep them
updated, and you can improve performance on a busy server by removing them. You can also free up some disk space
without them. The size column below helps to understand where you have the most opportunity for saving on disk
usage.

mysql>
SELECT. OBJECT_NAME,
        index_name,
        SUM(INDEX_LENGTH) AS size,
        SUM(count_star) AS count_star,
        SUM(count_read) AS count_read,
        SUM(count_write) AS count_write
FROM  table_io_waits_summary_by_index_usage
JOIN information_schema.TABLES
    ON table_io_waits_summary_by_index_usage.OBJECT_SCHEMA = TABLES.TABLE_SCHEMA
   AND table_io_waits_summary_by_index_usage.OBJECT_NAME = TABLES.TABLE_NAME
WHERE OBJECT_SCHEMA LIKE 'mydatabase%'
GROUP BY object_name, index_name
ORDER BY count_star ASC, size DESC
LIMIT 20;

+------------------------------+---------------------------------+-------------+------------+------------+-------------+
| OBJECT_NAME                  | index_name                      | size        | count_star | count_read | count_write |
+------------------------------+---------------------------------+-------------+------------+------------+-------------+
| transactions                 | order_id                        | 42406641664 |          0 |          0 |           0 |
| transactions                 | msku-timestamp                  | 42406641664 |          0 |          0 |           0 |
| transactions                 | fkTransactionsBaseEvent         | 42406641664 |          0 |          0 |           0 |
| baseEvent                    | PRIMARY                         | 33601945600 |          0 |          0 |           0 |
| baseEvent                    | eventTypeId                     | 33601945600 |          0 |          0 |           0 |
| orders                       | modified                        | 20579876864 |          0 |          0 |           0 |
| orders                       | buyerId-timestamp               | 20579876864 |          0 |          0 |           0 |
| productReports               | productAd-date-venue            |  8135458816 |          0 |          0 |           0 |
| shipmentEvent                | id                              |  7831928832 |          0 |          0 |           0 |
| shipmentEvent                | eventTypeId                     |  7831928832 |          0 |          0 |           0 |
| historyEvents                | timestamp_venue_entity          |  4567531520 |          0 |          0 |           0 |
| targetReports                | venueId-date-targetId           |  3069771776 |          0 |          0 |           0 |
| productAds                   | venue-productAd                 |  1530888192 |          0 |          0 |           0 |
| keywords                     | venue-keyword                   |   895598592 |          0 |          0 |           0 |
| targetingExpressions         | venue-target                    |   215269376 |          0 |          0 |           0 |
| targetingExpressions         | rType-rValue                    |   215269376 |          0 |          0 |           0 |
| serviceFeeEvent              | PRIMARY                         |    48234496 |          0 |          0 |           0 |
| serviceFeeEvent              | id                              |    48234496 |          0 |          0 |           0 |
| serviceFeeEvent              | eventTypeId                     |    48234496 |          0 |          0 |           0 |
| adGroups                     | venue-adGroup                   |    42336256 |          0 |          0 |           0 |

PHP Sessions with Redis Cluster (using AWS Elasticache)

I’ve recently been moving some of our project from a single Redis server (or server with a replica) to the more modern Redis Cluster configuration. However, when trying to set up PHP sessions to use the cluster, I found there wasn’t a lot of documentation or examples. This serves as a walk-through for setting up PHP sessions to use a redis Cluster, specifically with Elasticache on AWS.

First, create your Elasticache Redis Instance like so. Note the “Cluster Mode Enabled” is what causes redis to operate in Cluster mode.

AWS Elasticache Redis Creation

Once there servers are launched, make note of the Configuration Endpoint which should look something like: my-redis-server.dltwen.clustercfg.usw1.cache.amazonaws.com:6379

Finally, use these settings in your php.ini file. The exact location of this file will depend on your OS, but on modern Ubuntu instances, You can place it in /etc/php/7.0/apache2/conf.d/30-redis-sessions.ini

Note the special syntax for the save_path where is has seed[]=. You only need to put the main cluster configuration endpoint here. Not all of the individual instances as other examples online appear to use.


session.save_handler = rediscluster
session.save_path = "seed[]=my-redis-server.dltwen.clustercfg.usw1.cache.amazonaws.com:6379"
session.gc_maxlifetime = 1296000

That’s it. Restart your webserver and sessions should now get saved to your Redis cluster.

IIn the even that something goes wrong, you might see something like this in your web server log files:


PHP Warning: Unknown: Failed to write session data (redis). Please verify that the current setting of session.save_path is correct (tcp://my-redis-server.dltwen.clustercfg.use1.cache.amazonaws.com:6379) in Unknown on line 0

APISigning Now Works with Amazon Simple Email Service (SES)

APISigning.com has been signing Amazon Product Advertising requests for a couple of years now. Amazon recently announced their Simple Email Service that makes it easy to send emails via an API. The SES API requires that requests be authenticated using some cryptographic functions that are not easily available on all platforms or programming languages. In those cases, developers can use the APISigning SES Service to calculate the correct signature and perform the request on their behalf.

APISigning has free accounts that effectively allows 10k signing requests each month. Users who require additional requests can subscribe to a paid account with higher limits.

KnitMeter.com Has Been Upgraded

KnitMeter Logo

KnitMeter Has Been Upgraded

KnitMeter.com was originally started over four years ago in December of 2007 as a small project that my wife thought would be useful. Since then, the site hasn’t changed much, but it has managed to grow to thousands of users who have knit nearly 20 thousand miles of yarn. I’ve received numerous requests and have finally gotten a chance to impliment what many of you have been requesting for a while now. New features on the site include:

  • Users can now add entries for knitting, crocheting, and spinning
  • Completely new and modernized design and logo
  • You can customize your widgets directly on KnitMeter.com rather than editing the code for the widget on your website
  • The website and the KnitMeter Facebook Application are now completely integrated. Entries added in one will be displayed and counted in the other
  • The Facebook application can (again) publish your entries to your news feed, but only when you tell it to
  • You can chose to make your profile public, which will display some of the most recent entries on the KnitMeter home page with a link to your website
  • Added several new timeframes, including specific calendar years (ie: I knit 4.3 miles in 2010)
  • Numerous technical changes that should make the site faster to use and make it easier to make future changes

These new features have been rolled out over the past couple of weeks. I appreciate the patience of those who have dealt with a few bugs over that time, and I believe that everything should be pretty bug-free now. I encourage you to check out the new site and to start adding up the mileage for your own projects. The next major milestone will be when we have gone through enough yarn to go around the earth (about 24,901 miles). At the present rate, we should hit that figure in about 3-5 months.

Happy Knitting, Crocheting, and Spinning,
Brandon Checketts
KnitMeter.com

Website Performance: Tables Versus CSS

Most website designers have been using CSS for page layout for several years now, but I occasionally see some websites that continue to use HTML tables for layout. As I’ve been focusing on website performance lately, I’ve found some references that modern browsers render sites using tables for layout slower than they do sites that use CSS. I decided to investigate and confirmed that there are many possible situations where sites using large tables will appear to load much slower than those using CSS. I put together two pages to confirm:

This page uses <div> elements for layout
and
This pages uses a large table for layout

On both pages I’ve added a 5-second sleep near the end of the page to show what might happen if the server was slow, if there were network problems, or any other number of things may have happened.

Notice that the page created using a table changes a lot after the delay. I’ve tried it in Firefox 3 which extends the main (yellow) content section all of the way to the right until it receives the rest of the document, at which point it has to shrink that part to make room for the section on the right. Internet Explorer behaves even worse. It leaves a blank white page until after the delay, at which point it draws the whole table.

By contrast, the page created with CSS positioning shows all of the content above the delay and has it in the correct position. When the rest of the document is sent it just fills in the appropriate content, but doesn’t have to re-arrange anything on the page.

Enabling HTTP Page Caching with PHP

I’ve been doing a lot of work on BookScouter.com lately to reduce page load time and generally increase the performance of the website for both users and bots. One of the tips that the load time analyzer points out is to enable an expiration time for static content. That is easy enough for images and such by using an Apache directive such as:

    ExpiresActive On
    ExpiresByType image/gif A2592000
    ExpiresByType image/jpg A2592000
    ExpiresByType image/png A2592000

But pages generated with PHP by default have the Pragma: no-cache header set, so that the users’ browsers do not cache the content at all. In most cases, even hitting the back button will generate another request to the server which must be completely processed by the script. You may be able to cache some of the most intensive operations inside your script, but this solution will eliminate that request completely. Simply add this code to the top of any page that contains semi-static content. It effectively sets the page expiration time to one hour in the future. So if a visitor hits the same URL within that hour, the page is served locally from their browser cache instead of making a trip to the server. It also sends an HTTP 304 (Not Modified) response code if the user requests to reload the page within the specified time. That may or may-not be desired based on your site.

$expire_time = 60*60; // One Hour
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + $expire_time));
header("Cache-Control: max-age={$expire_time}");
header('Last-Modified: '.gmdate('D, d M Y H:i:s \G\M\T', time()));
header('Pragma: public');

if ((!empty($_SERVER['HTTP_IF_MODIFIED_SINCE'])) && (time() - strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']) < = $expire_time)) {
    header('HTTP/1.1 304 Not Modified');
    exit;
}   

Skipping the DROP TABLE, CREATE TABLE statements in a large mysqldump file.

I have a large table of test data that I’m copying into some development environments. I exported the table with a mysqldump which has a DROP TABLE and CREATE TABLE statements at the top

DROP TABLE IF EXISTS `mytable`;
CREATE TABLE `mytable` (
  `somecol` varchar(10) NOT NULL default '',
   ... other columns ...
  PRIMARY KEY  (`somecol`),
  KEY `isbn10` (`somecol`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The problem is that the developer has altered the table and re-importing the test data would undo those changes. Editing the text file is impractical because of its size (500 MB gzipped). So I came up with this workaround which just slightly alters the SQL using sed so that it doesn’t try to drop or recreate the table. It comments out the DROP TABLE line, and creates the new table in the test database instead of the real database.

zcat bigfile.sql.gz |sed "s/DROP/-- DROP/"|sed "s/CREATE TABLE /CREATE TABLE test./"|mysql databasename

Installing SVN and Trac on a CentOS 5 server

Make sure that you have the RPMForge repository enabled. Install Subversion, mod_dav_svn, and trac. This will install a few required dependencies (ie: neon and some python utils)

# yum install subversion mod_dav_svn mod_python trac

Create a directory for your repositories, and an initial repository for testing, and create your htpasswd file. Then create a trac environment and set it up.

# mkdir /home/svn/
# svnadmin create testrepo
# chown -R apache:apache /home/svn/*
# htpasswd -c  /home/svn/.htpasswd brandon

#mkdir /home/trac/
# trac-admin /home/trac/ initenv
    ... answer questions as appropriate ...
# chown apache:apache /home/trac/*
# htpasswd -c  /home/svn/.htpasswd brandon

Add this to your Apache configuration in the relevant place (I like to put it under an SSL VirtualHost)

    <Location /svn>
        DAV svn
        SVNParentPath /home/svn/
        #SVNListParentPath on
        # Authentication
        AuthType Basic
        AuthName "RoundSphere SVN Repository"
        AuthUserFile /home/svn/.htpasswd
        Order deny,allow
        Require valid-user
    </Location>
    <Location /trac>
        SetHandler mod_python
        PythonHandler trac.web.modpython_frontend
        PythonOption TracEnv /home/trac
        PythonOption TracUriRoot /trac
        # Authentication
        AuthType Basic
        AuthName “MyCompany Trac Environment"
        AuthUserFile /home/svn/.htpasswd
        Require valid-user
    </Location>

Now test to make sure that you can view your test repository in a browser and that it prompts for a username and password as desired:

https://your-hostname/svn/testrepo/

You should retrieve a plain looking page that mentions the name of your repository and that it is at Revision 0

You should also be able to access your trac installation at

https://your-hostname/trac/

Customize your logo, change the home page, start making some tickets, using the wiki and get to work.

PHP Wrapper Class for a Read-only database

This is a pretty special case of a database wrapper class where I wanted to discard any updates to the database, but want SELECT queries to run against an alternative read-only database. In this instance, I have a planned outage of a primary database server, but would like the public-facing websites and web services to remain as accessible as possible.

I wrote this quick database wrapper class that will pass all SELECT queries on to a local replica of the database, and silently discard any updates. On this site almost all of the functionality still works, but it obviously isn’t saving and new information while the primary database is unavailable.

Here is my class. This is intended as a wrapper to an ADOdb class, but it is generic enough that I think it would work for many other database abstraction functions as well as seamless data pump.

class db_unavailable {
    var $readonly_db;

    function __construct($readonly_db)
    {
        $this->query_db = $readonly_db;
    }

    function query($sql)
    {
        $args = func_get_args();
        if (preg_match("#(INSERT INTO|REPLACE INTO|UPDATE|DELETE)#i", $args[0])) {
            // echo "Unable to do insert/replace/update/delete query: $sql\n";
            return true;
        } else {
            return call_user_func_array(array($this->readonly_db, 'query'), $args);
        }
    }

    function __call($function, $args)
    {
        return call_user_func_array(array($this->readonly_db, $function), $args);
    }
}

I simply create my $query_db object that points to the read-only database. Then create my main $db object as a new db_unavailable() object. Any select queries against $db will behave as they normally do, and data-modifying queries will be silently discarded.