Every MySQL DBA should deal with the situation, when there were an accidental write on one of the slaves. Changing replication to GTID will change the way how we should deal with that problem.

Let’s check out!

So I assume you have two servers, a master and a slave.

root@master vagrant]# cat /etc/my.cnf [mysql]  prompt = "master > "  [mysqld] log-bin server-id = 1  
root@slave vagrant]# cat /etc/my.cnf [mysql]  prompt = "slave > "  [mysqld] log-bin server-id = 2  [root@master vagrant]# mysql Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.15-56-log Percona Server (GPL), Release rel63.0, Revision 519  Copyright (c) 2009-2013 Percona LLC and/or its affiliates Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.  Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.  Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.  master > SHOW MASTER STATUS\G *************************** 1. row ***************************              File: master-bin.000001          Position: 120      Binlog_Do_DB:  Binlog_Ignore_DB: Executed_Gtid_Set: 1 row in set (0.00 sec)  master > GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'repl'; Query OK, 0 rows affected (0.00 sec)  master > FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec)  slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected, 1 warning (0.00 sec)  slave > SHOW WARNINGS; +-------+------+------------------------------------------------------------------------+ | Level | Code | Message                                                                | +-------+------+------------------------------------------------------------------------+ | Note  | 1759 | Sending passwords in plain text without SSL/TLS is extremely insecure. | +-------+------+------------------------------------------------------------------------+ 1 row in set (0.00 sec)  slave > SHOW SLAVE STATUS\G *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.50.50                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: master-bin.000001           Read_Master_Log_Pos: 397                Relay_Log_File: slave-relay-bin.000002                 Relay_Log_Pos: 561         Relay_Master_Log_File: master-bin.000001              Slave_IO_Running: Yes             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 397               Relay_Log_Space: 734               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 1                   Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d              Master_Info_File: /var/lib/mysql/master.info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0 1 row in set (0.00 sec)

So we have set up a basic replication cluster with one master, and one slave server. We also got a warning about we should use SSL/TLS but that is an another story.

Let’s populate the databases with some test data. First we create a table with an auto increment field, and inserting some data there. {% highlight bash %} master > USE test Database changed master > CREATE TABLE testdata ( -> id bigint(20) NOT NULL AUTO_INCREMENT, -> text varchar(45) NOT NULL, -> date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -> PRIMARY KEY (id) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.04 sec) master > INSERT INTO testdata (text) VALUES (‘test’); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES (‘test’); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES (‘test’); Query OK, 1 row affected (0.00 sec) master > SELECT COUNT() FROM testdata; +———-+ | COUNT() | +———-+ | 3 | +———-+ 1 row in set (0.00 sec) slave > USE test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed slave > SELECT COUNT() FROM testdata; +———-+ | COUNT() | +———-+ | 3 | +———-+ 1 row in set (0.00 sec) {% endhighlight %} OK, we have a table, we inserted some data into that, and it was replicated to the slave.

Let’s break this. The most easy way to achieve this, is to insert some data directly on the slave, and insert data again on the master. What we expect is that the replication will break, because there will be a duplicate key on the slave servers table when it tries to replicate:

slave > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |        6 | +----------+ 1 row in set (0.00 sec)  slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |        4 | +----------+ 1 row in set (0.00 sec)  slave > SHOW SLAVE STATUS\G *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.50.50                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: master-bin.000001           Read_Master_Log_Pos: 1751                Relay_Log_File: slave-relay-bin.000002                 Relay_Log_Pos: 1656         Relay_Master_Log_File: master-bin.000001              Slave_IO_Running: Yes             Slave_SQL_Running: No               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 1062                    Last_Error: Error 'Duplicate entry '4' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')'                  Skip_Counter: 0           Exec_Master_Log_Pos: 1492               Relay_Log_Space: 2088               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 1062                Last_SQL_Error: Error 'Duplicate entry '4' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')'   Replicate_Ignore_Server_Ids:              Master_Server_Id: 1                   Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d              Master_Info_File: /var/lib/mysql/master.info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State:            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp: 140228 15:09:33                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0 1 row in set (0.00 sec)

Exactly! That was what we expected.

So let’s fix this. There are several ways, you can rebuild the slave, you can delete the duplicate key on the slave, or you can order the slave to simply skip that binlog entry.

We will do this, with using the SLAVE_SKIP_COUNTER variable. This variable is used to skip certain logentries, for us this is 1 at the moment.

slave > STOP SLAVE; Query OK, 0 rows affected (0.00 sec)  slave > SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; Query OK, 0 rows affected (0.00 sec)  slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected (0.00 sec)  slave > SHOW SLAVE STATUS\G *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.50.50                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: master-bin.000001           Read_Master_Log_Pos: 2269                Relay_Log_File: slave-relay-bin.000003                 Relay_Log_Pos: 284         Relay_Master_Log_File: master-bin.000001              Slave_IO_Running: Yes             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 2269               Relay_Log_Space: 2770               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 1                   Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d              Master_Info_File: /var/lib/mysql/master.info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0 1 row in set (0.00 sec)  slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |        6 | +----------+ 1 row in set (0.00 sec)

YAY!

So far so good.

I just realised that this is not a good example because of the ‘date’ column,

 `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

because the timestamps differs. But in fact I just write this to show the differences between the standard and GTID mode, so forgive me this. At this scenario I had to remove the row, and restart replication, but please pretend this isn’t happened. I always just use this example table, so get over it please :).

Let’s switch to GTID based replication!

[root@master vagrant]# vi /etc/my.cnf [root@master vagrant]# service mysql restart Shutting down MySQL (Percona Server).. SUCCESS! Starting MySQL (Percona Server). SUCCESS! [root@master vagrant]# cat /etc/my.cnf [mysql]  prompt = "master > "  [mysqld] log-bin server-id = 1 gtid-mode = ON log-slave-updates enforce-gitd-consistency  [root@slave vagrant]# vi /etc/my.cnf [root@slave vagrant]# service mysql restart Shutting down MySQL (Percona Server).. SUCCESS! Starting MySQL (Percona Server). SUCCESS! [root@slave vagrant]# cat /etc/my.cnf [mysql]  prompt = "slave > "  [mysqld] log-bin server-id = 2 gtid-mode = ON log-slave-updates enforce-gitd-consistency  slave > STOP SLAVE; Query OK, 0 rows affected (0.01 sec)  slave > CHANGE MASTER TO MASTER_HOST='192.168.50.50', MASTER_AUTO_POSITION=1; Query OK, 0 rows affected (0.02 sec)  slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected, 1 warning (0.01 sec)  slave > SHOW SLAVE STATUS\G *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.50.50                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: master-bin.000002           Read_Master_Log_Pos: 151                Relay_Log_File: slave-relay-bin.000002                 Relay_Log_Pos: 363         Relay_Master_Log_File: master-bin.000002              Slave_IO_Running: Yes             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 151               Relay_Log_Space: 567               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 1                   Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d              Master_Info_File: /var/lib/mysql/master.info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 1 1 row in set (0.00 sec)  master > USE test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A  Database changed master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec)  master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |       10 | +----------+ 1 row in set (0.00 sec)  slave > use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A  Database changed slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |       10 | +----------+ 1 row in set (0.00 sec)

We can break again the replication, as in the past, and try to solve that in the same way:

slave > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec)  master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |       12 | +----------+ 1 row in set (0.00 sec)  slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |       11 | +----------+ 1 row in set (0.00 sec)

Yeah, it was break again. Let’s try to fix this in the same way as before.

slave > STOP SLAVE; Query OK, 0 rows affected (0.00 sec)  slave > SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction

That’s it! We can’t solve this problem in the same way, but MySQL is kind to tell what have we do, we have to insert an empty transaction, with the same TID as the bad transaction, and when we restart the slave. First, check the slave status:

slave > SHOW SLAVE STATUS\G *************************** 1. row ***************************                Slave_IO_State:                   Master_Host: 192.168.50.50                   Master_User:                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: master-bin.000002           Read_Master_Log_Pos: 1993                Relay_Log_File: slave-relay-bin.000002                 Relay_Log_Pos: 1591         Relay_Master_Log_File: master-bin.000002              Slave_IO_Running: No             Slave_SQL_Running: No               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 1062                    Last_Error: Error 'Duplicate entry '11' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')'                  Skip_Counter: 0           Exec_Master_Log_Pos: 1379               Relay_Log_Space: 2409               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 1062                Last_SQL_Error: Error 'Duplicate entry '11' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')'   Replicate_Ignore_Server_Ids:              Master_Server_Id: 1                   Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d              Master_Info_File: /var/lib/mysql/master.info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State:            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp: 140228 15:52:47                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-6             Executed_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-4, 6d1bb29b-a087-11e3-99a4-080027079e3d:1                 Auto_Position: 1 1 row in set (0.00 sec)

You can see that the last executed GTID set is 59879f62-a087-11e3-99a4-080027079e3d:1-4. The only thing we need to do is to insert an empty transaction with the duplicate one (59879f62-a087-11e3-99a4-080027079e3d:5 <- because this was what didn’t run) and after that restart the replication. See how:

slave > SET GTID_NEXT="59879f62-a087-11e3-99a4-080027079e3d:5"; Query OK, 0 rows affected (0.00 sec)  slave > BEGIN; Query OK, 0 rows affected (0.00 sec)  slave > COMMIT; Query OK, 0 rows affected (0.00 sec)  slave > SET GTID_NEXT="AUTOMATIC"     -> ; Query OK, 0 rows affected (0.00 sec)  slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected, 1 warning (0.00 sec)  slave > pager grep Gtid PAGER set to 'grep Gtid' slave > SHOW SLAVE STATUS\G            Retrieved_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-7             Executed_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-7, 1 row in set (0.00 sec)  slave > \P Default pager wasn't set, using stdout. slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ |       12 | +----------+ 1 row in set (0.00 sec)

So the replication is fixed by inserting an empty transaction.