I don’t know how common is this problem, but it is good to know from time to time about which tables how many storage space needed in certain time. For example, you can catch an amok running software part which writes your database full. Or, – as you will see soo – you can catch up some code what doesn’t work as excepted.

So, lets start at the beginning. You wanna to know how big are your tables, and you need to know how many data gets there day-by-day (or minute-by-minute. or whatever).

You can query information_schema.tables for table sizes, this is good, but you won’t be happy just with these results, because you won’t find any time based changes, so you have to store the data.

So first, we have to create a table to store this historical data:

CREATE TABLE `test`.`table_sizes` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `tablename` varchar(50) DEFAULT NULL,
  `datasize` decimal(10,2) DEFAULT NULL,
  `indexsize` decimal(10,2) DEFAULT NULL,
  `totalsize` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`);

The next step is to provide some data here. For this I made a script called ‘monitor_space_collector.pl‘. This script can be run either manually, or as a cronjob. Needless to say, your data resolution depends on how often you will run this, but the size of your table depends on this too (and of course the number of your tables what you wanna to keep track.) On Kinja, I run this script as a cronjob in every 15 minutes.

If you made this, and your data is monitored, you can query this table for example like this:

SELECT tablename, max(totalsize) - min(totalsize) AS r
FROM
     table_sizes
WHERE
     `date` > '2013-07-10 05:55:00'
GROUP BY
     tablename HAVING r > 0;

It will show you the results you wanted, (in this case the table size changes from 2013-07-10 05:55:00 till now, and show only that tables which size are changed.

Well, I don’t like write SQL queries unnecessary all the time when I needed the same data, so I wrote instead a query tool named monitor_space_qt.pl. This query tool has two major modes: when you gives the -w option you must provide a table name, (even with % wildcard) and the tool will list all the entries from your table_sizes table about the queried table(s). You can fine-tune the data you got with -f (from time) and -t (’till time). When you don’t use -w, then it will print the size changes of the tables in megabytes from the start of monitoring until now (or, of course you can fine-tune the date again with -f and -t)

[root@sql-master2 vividcortex]# /usr/local/bin/monitor_space_qt.pl -h
Usage: ./msqt.pl [ -h ] [ -u user ] [ -p password ]
   -h                 this help screen
   -u username        username with connect to mysql. Defaults 'root'
   -p password        password with connect to mysql. If not provided,asks for it.
   -f date            from date (mysql date format)
   -t date            to date (mysql date format
   -w tablename       table to check
   -e                 don't write headers (useful for sorting, and using output in pipeline)

Lets see an example! As I mentioned before in these times we have some fight of database sizes – the FusionIO cards we are using are pretty expensive – so I made this to figure out, how can I spare some data. So I made this util, left the monitor_space_collector to collect data for a few days, and I started to look the results I’ve got this:

[root@sql-master2]# /usr/local/bin/monitor_space_qt.pl -e | sort -n -k 2
[...]
kinja.stats_day                                       630.00
kinja.post                                           1022.00
kinja.oid                                            1856.00
kinja.sessiontoken                                   2072.02
sso.sessions2                                        4697.99

(notice the fine -e option, I didn’t printed out any header, so I wont be a mess when I sorted it with sort! cool, ehh?)

So, I started to figure out which table is for that, and is it okay, to have so many size changes. I found the kinja.oid table, what is used for oid generation – and it should hold only the current maximum of oid number. In fact there is no delete job on that table, so it’s size hits the moon! It is now ~9 Gb of data and counting. So here is the benefit of this tool: I know that I have to delete from this table, and run an optimize table command on this, and I have freed up 9 Gb’s of data.

You can find the all the files on github under my Utilities repository (where all of my MySQL utilites are) under monitor_space directory. Use it if you think is is useful for you.