Fixing a Corrupt MySQL Relay Log

As an extension of my post yesterday about skipping corrupt queries in the relay log, I found out that my problem is due to some network problems between the servers which triggers a MySQL bug.

The connection and replication errors in my MySQL log looks like this:

080930 12:26:52 [ERROR] Error reading packet from server: Lost connection to MySQL server 
  during query ( server_errno=2013)
080930 12:26:52 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, 
  log 'mysql-bin.000249' position 747239037
080930 12:26:53 [Note] Slave: connected to master 'replicate@mysqltunnel:13306',replication 
  resumed in log 'mysql-bin.000249' at position 747239037
080930 13:18:49 [ERROR] Error reading packet from server: Lost connection to MySQL server during
   query ( server_errno=2013)
080930 13:18:49 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 
  'mysql-bin.000249' position 783782486
080930 13:18:49 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the manual 
  that corresponds to your MySQL server version for the right syntax to use near '!' at line 6' 
  on query. Default database: 'database'. Query: 'INSERT INTO `sometable`
            SET   somecol         = 3,
                    comeothercol  = 8,
                    othervalue      = NULL!', Error_code: 1064
080930 13:18:49 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and 
  restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000249' 
  position 783781942
080930 13:18:50 [Note] Slave: connected to master 'replicate@mysqltunnel:13306',
  replication resumed in log 'mysql-bin.000249' at position 783782486

When there are network problems between the server, there was some issue where the master didn’t properly detect and notify the slave of the failure. This resulted in parts of queries missing, duplicated, or replaced by random bits in the relay log on the slave. When the slave tries to execute the corrupt query, it will likely generate an error that begins with:

Error You have an error in your SQL syntax; check the manual that corresponds to 
  your MySQL server version for the right syntax to use near . .

This bug has been fixed in MySQL releases since February 2008, but still hasn’t made its way into the CentOS 5 repositories. Until then, that bug report contained a work-around which forces the slave to re-request the binary log from the master. Run ‘SHOW SLAVE STATUS’ and make note of the Master_Log_File and Exec_Master_Log_Pos columns. Then run ‘STOP SLAVE’ to suspend replication, and run this SQL:

CHANGE MASTER TO master_log_file='<Value from Relay_Master_Log_File>',
  master_log_pos=<Value from Exec_master_log_pos>;

After that, simply run ‘START SLAVE’ to have replication pick up from there again. That evidently has the slave re-request the rest of the master’s binary log, which it should (hopefully) get without becoming corrupt, and replication will continue as normal.

I guess the network connection between my servers is problematic lately. I’ve had to fix this several times in the past couple days. If that keeps up, I may add this fix to my Replication checking-script until I’m able to upgrade to a version of MySQL that contains this fix.

3 Comments

jeric

April 22, 2009 at 7:26 am

(I know it’s an old post, but your post turns up at the top of google results when searching for relay log corruption .. so maybe someone else can benefit from this)

One correction to the solution: be sure to use the Relay_Master_Log_File value instead of the Master_Log_File (for an explanation see: https://www.mysqlperformanceblog.com/2008/07/07/how-show-slave-status-relates-to-change-master-to/)

In our case the relay log was corrupted, but reading from the binary log of the master continued. As we noticed this problem only a few days after the corruption occured the Relay_Master_Log_File and Master_Log_File where completely different.

Chinaren Wei

April 6, 2010 at 9:54 pm

This solved my problem, Thank you.

pumpal

November 28, 2013 at 3:16 am

That fixed it. Thank you sir!

jeric
April 22, 2009 at 7:26 am

(I know it’s an old post, but your post turns up at the top of google results when searching for relay log corruption .. so maybe someone else can benefit from this)

One correction to the solution: be sure to use the Relay_Master_Log_File value instead of the Master_Log_File (for an explanation see: https://www.mysqlperformanceblog.com/2008/07/07/how-show-slave-status-relates-to-change-master-to/)

In our case the relay log was corrupted, but reading from the binary log of the master continued. As we noticed this problem only a few days after the corruption occured the Relay_Master_Log_File and Master_Log_File where completely different.
Chinaren Wei
April 6, 2010 at 9:54 pm

This solved my problem, Thank you.
pumpal
November 28, 2013 at 3:16 am

That fixed it. Thank you sir!

1 Pingback

Replication stop - Relay log file corrupt | am3n portfolio

Fixing a Corrupt MySQL Relay Log

3 Comments

1 Pingback

Leave a Reply

Categories

Archives

My Popular Tools

Recent Posts

categories

Archives