When you have to drop a large database, you’ll encounter some problems, mainly replication lag. Now I’ll show you how to avoid this.

What can cause replication lag when you drop a database? First, it takes some disk I/O to unlink the files, and secondly MySQL will scan through the buffer pool to see if there are pages from that database or not. On a huge (or at least big) database this could take seconds or even minutes, what means your slaves will collect lag for seconds or (of course) even minutes.

Every database dropping should be started with a backup of that database. I did it on a slave.

[root@db-secondary.bfc /home/banyek/chomp]# mysqldump --set-gtid-purged=OFF --single-transaction chomp > chomp.sql
[root@db-secondary.bfc /home/banyek/chomp]# ls
chomp.sql
[root@db-secondary.bfc /home/banyek/chomp]# ls -alh
total 823M
drwxr-xr-x 2 root   root   4.0K Mar 24 05:31 .
drwx------ 4 banyek banyek 4.0K Mar 24 05:31 ..
-rw-r--r-- 1 root   root   823M Mar 24 05:32 chomp.sql
[root@db-secondary.bfc /home/banyek/chomp]# gzip chomp.sql
[root@db-secondary.bfc /home/banyek/chomp]# ls -lah
total 202M
drwxr-xr-x 2 root   root   4.0K Mar 24 05:33 .
drwx------ 4 banyek banyek 4.0K Mar 24 05:31 ..
-rw-r--r-- 1 root   root   202M Mar 24 05:32 chomp.sql.gz

So, your database is dumped, you can now safely remove it.

The following steps should be happen:

  • Clean up data from tables
  • Shrink the datafiles
  • Drop tables
  • Drop database

###Clean up data from tables

First, check what to deal with. How many tables we have, how big they are, etc.

[root@db-master.bfc /home/banyek]# ls -lh /var/lib/mysql/chomp/
total 2.8G
-rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 apikeys.frm
-rw-r----- 1 mysql mysql 1.0M Jan 21 00:01 apikeys.ibd
-rw-r----- 1 mysql mysql   65 Jan 21 00:25 db.opt
-rw-r----- 1 mysql mysql  13K Jan 21 00:25 images.frm
-rw-r----- 1 mysql mysql 2.8G Mar  4 14:22 images.ibd
-rw-r----- 1 mysql mysql 8.4K Jan 21 00:25 keys.frm
-rw-r----- 1 mysql mysql 1.0M Jan 21 00:01 keys.ibd
-rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 transform_lock.frm
-rw-r----- 1 mysql mysql 8.0M Jan 21 00:01 transform_lock.ibd
-rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 videos.frm
-rw-r----- 1 mysql mysql 9.0M Mar  4 13:14 videos.ibd

Mon Mar 24 05:34:26 2014root@localhost [chomp]> select count(*) from images;
+----------+
| count(*) |
+----------+
|  8843673 |
+----------+
1 row in set (7.57 sec)

Mon Mar 24 06:31:58 2014root@localhost [chomp]> select count(*) from apikeys
    -> ;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

Mon Mar 24 06:32:16 2014root@localhost [chomp]> select count(*) from `keys`;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

Mon Mar 24 06:32:30 2014root@localhost [chomp]> select count(*) from transform_lock;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Mon Mar 24 06:32:42 2014root@localhost [chomp]> select count(*) from videos;
+----------+
| count(*) |
+----------+
|     8570 |
+----------+
1 row in set (0.02 sec)

You can see we have only one really big table, and that is ‘images’ with about 9 million rows. The ‘videos’ table is the second biggest, it is 1/1000th of the images.

So far, so good.

I use the pt-archiver tool to clean up tables, because it chunks up the data what needed to delete. That means it will run the delete command with a limit clause where ‘limit’ equals the chunksize. I have calculated earlier that our database can delete about 2000 row in a chunk with no replication lag, so I’ll use the same chunksize for cleaning up imges.

Before I start the delete, I check how many chunks I have – just to help myself keep tracking the state of delete.

Mon Mar 24 05:39:23 2014root@localhost [chomp]> select 8843673 / 2000 as chunks;
+-----------+
| chunks    |
+-----------+
| 4421.8365 |
+-----------+
1 row in set (0.00 sec)

Here is the sniplet I use to track the state of delete:

# SELECT CONCAT(100 - ROUND(($CHUNKS - (($MAXITEMS - COUNT(*)) / $CHUNKSIZE / $CHUNKS * 100),'%' as percent_done FROM $TABLE_TO_CLEAN_UP
#
# So, wrapped into a while loop, in the current way it will be like:

while true; do clear ; mysql chomp -e "select concat(100 - round((4421 - ((8843673 - count(*)) /2000)) / 4421 * 100),'%')  as percent_done from images;" ; sleep 30; done

# It will produce a neat table like this - refreshed in every 30 seonds.
# +--------------+
# | percent_done |
# +--------------+
# | 61%          |
# +--------------+

After the progressbar is set up, we can start the table cleanup job with pt-archiver

[root@db-master.bfc /home/banyek]# pt-archiver --statistics --no-check-charset --limit 2000 --txn-size 2000 --source D=chomp,t=images,u=user,p=password --purge --where "1=1"

Source: D=chomp,p=...,t=images,u=user
SELECT 8420052
INSERT 0
DELETE 8420052
Action        Count       Time        Pct
deleting    8420052  1981.1935      72.48
select         4212    72.4313       2.65
commit         4211     9.3239       0.34
other             0   670.3669      24.53

# You can see the 'Count' row is a bit less than it was at the select count(*), but this is just because
# I've ran the archiver two times.
#
#
# After the cleanup is done at the table, repeat this with the others too.

###Shrink the datafiles

After you have cleaned up the data from the tables themselves you can see that the datafiles are still there. (You can read more about this here) So the next step is to shrink the datafiles. The easiest way to achieve this to run a simple alter like ‘ALTER TABLE tablename engine=InnoDB’ but we are still aware of replication lag, so run it via pt-online-schema-change

Free up space on mysql servers

First, I have to tell you, that the “fragmentation” is not the best word what I should…Read more

root@db-master.bfc /var/log/mysql]# pt-online-schema-change --execute --alter "engine=InnoDB;" h=localhost,u=user,p=password,D=chomp,t=images
Found 31 slaves:
[...]
Altering new table...
Altered `chomp`.`_images_new` OK.
2014-03-24T06:38:39 Creating triggers...
2014-03-24T06:38:39 Created triggers OK.
2014-03-24T06:38:39 Copying approximately 1 rows...
2014-03-24T06:38:39 Copied rows OK.
2014-03-24T06:38:39 Swapping tables...
2014-03-24T06:38:39 Swapped original and new tables OK.
2014-03-24T06:38:39 Dropping old table...
2014-03-24T06:38:40 Dropped old table `chomp`.`_images_old` OK.
2014-03-24T06:38:40 Dropping triggers...
2014-03-24T06:38:40 Dropped triggers OK.
Successfully altered `chomp`.`images`.
[root@db-master.bfc /var/lib/mysql/chomp]# ls -lah
total 11M
drwx------  2 mysql mysql 4.0K Mar 24 06:38 .
drwxr-xr-x 16 mysql mysql 4.0K Mar  4 04:17 ..
-rw-r-----  1 mysql mysql 8.5K Jan 21 00:25 apikeys.frm
-rw-r-----  1 mysql mysql 1.0M Jan 21 00:01 apikeys.ibd
-rw-r-----  1 mysql mysql   65 Jan 21 00:25 db.opt
-rw-rw----  1 mysql mysql  13K Mar 24 06:38 images.frm
-rw-rw----  1 mysql mysql 128K Mar 24 06:38 images.ibd
-rw-r-----  1 mysql mysql 8.4K Jan 21 00:25 keys.frm
-rw-r-----  1 mysql mysql 1.0M Jan 21 00:01 keys.ibd
-rw-r-----  1 mysql mysql 8.5K Jan 21 00:25 transform_lock.frm
-rw-r-----  1 mysql mysql 8.0M Jan 21 00:01 transform_lock.ibd
-rw-rw----  1 mysql mysql 8.5K Mar 24 06:37 videos.frm
-rw-rw----  1 mysql mysql  96K Mar 24 06:37 videos.ibd

###Drop tables

Now we can drop the tables. YAY!

Only we have to be aware of two things:

  1. use IF EXISTS clause, because if a table filtered out from replication it will break it.
  2. be aware of reserved words. (Check bottom: the ‘keys’ table can cause problems, because the word is reserved. Just put backtick surround this. (And don’t let the developers create tables with reserved words :D)
Mon Mar 24 06:42:26 2014root@localhost [chomp]> drop table if exists apikeys;
Query OK, 0 rows affected (0.01 sec)

Mon Mar 24 06:43:10 2014root@localhost [chomp]> drop table if exists images;
Query OK, 0 rows affected (0.00 sec)

Mon Mar 24 06:43:49 2014root@localhost [chomp]> drop table if exists `keys`;
Query OK, 0 rows affected (0.00 sec)

Mon Mar 24 06:43:55 2014root@localhost [chomp]> drop table if exists transform_lock;
Query OK, 0 rows affected (0.00 sec)

Mon Mar 24 06:44:06 2014root@localhost [chomp]> drop table if exists videos;
Query OK, 0 rows affected (0.00 sec)

Mon Mar 24 06:44:14 2014root@localhost [chomp]> show tables;
Empty set (0.00 sec)

###Drop database

Finally we are reached this point. We can now drop the database, and we’re done (please don’t forget the IF EXISTS clause!)

Mon Mar 24 06:44:17 2014root@localhost [chomp]> use mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
Mon Mar 24 06:44:20 2014root@localhost [mysql]> drop database if exists chomp;
Query OK, 0 rows affected (0.04 sec)

###Conclusion

Dropping a database is really an easy thing, just one command and everything done, but if your site has a lot of traffic, and your users didn’t tolerate weird things, you have to be cautious. In this scenario a simple ‘drop database’ had to be took about a few seconds, with a few seconds lag, but building up good habits are mandatory – so I recommend to always clean up database before dropping it.