Use Real Data in MySQL Performance Tests

MySQL’s mysqlslap utility helps you visualize how your database performance will improve with more hardware, new tuning, or different indexes.

The trouble is, it’s designed to use fake data. You can tune the fake data to look increasingly like your real data, and that will help you get a feel for changes like bigger hardware. But fake data teaches you almost nothing for deep changes like altering your schema, adding new indexes, or tweaking memory and cache parameters.

So we’re going to get completely authentic data from your database so you can plan and test performance changes with higher confidence.

This procedure is really ideal if you have a second server about the same size as the one in production. You absolutely do not want to run this on a production server; mysqlslap will try to contain the changes to a throwaway database (named mysqlslap) but it isn’t very bright and bad things are incredibly likely to happen. Run this process on a lab server, or a cloud server, or be prepared to do a full data restore when you’re finished.

Backup your database.

mysqldump -u root my_db_name --compact > ~/create.sql

Log into MySQL and turn on the innocently named “general log.” Don’t let the name fool you, this records every connect, disconnect, and query (write and read) occurring on the server. This is a bad idea if your server is anywhere near capacity, because it adds a disk hit to every single action on the server.

mysql> SET GLOBAL general_log_file = 'all-queries.log';
Query OK, 0 rows affected (0.01 sec)

mysql> SET GLOBAL general_log = 1;
Query OK, 0 rows affected (0.00 sec)

Make Things Happen

Now run some traffic through your system that you want to simulate. You could take a test account through a tour of your product’s important features, collect real traffic for a day, etc. When you’re done, turn the general log off:

mysql> SET GLOBAL general_log = 0;
Query OK, 0 rows affected (0.00 sec)

Change the Log File to a Slap File

Copy the general log file into your home directory and change its permissions. (It was created with strong permissions since it could contain sensitive data.)

$ sudo cp /var/lib/mysql/all-queries.log ~
$ sudo chmod a+rw ~/all-queries.log

Now get these files ready for mysqlslap. The general log file contains more than just queries, here’s a short excerpt from mine:

       3 Connect    user@localhost on application
       3 Query    SELECT * FROM users WHERE username='load'
       3 Query    SELECT * FROM charges WHERE username = 'load'
       3 Quit

These four records show a user (really a PHP app) connecting, running two SELECT queries, and disconnecting. mysqlslap only needs the queries, so we’ll create a new slap.sql with only Query rows, and without the log columns before the SQL statements.

grep " Query" all-queries.log | egrep -io '(SELECT|INSERT|UPDATE|DELETE|REPLACE).*' > slap.sql

We’re also going to re-execute these queries repeatedly, both in parallel (to load the server) and sequentially (to get multiple runs of the data for statistical significance, and to see “warm cache” performance). Repeating INSERTs in tables with unique primary keys is a problem, so we’ll replace all INSERT INTO with REPLACE INTO, which prevents key collisions (but can cause trouble with constraints or cascading deletes).

sed -i -E 's/INSERT INTO/REPLACE INTO/ig' slap.sql

You might have noticed that the SQL statements in slap.sql don’t have the customary ; at the end. That’s actually OK, by default mysqlslap considers each line to be a SQL statement, so slap.sql is fine. But you will need to make each of the CREATE TABLE statements in create.sql into one line. Open it in your favorite text editor and change all the statements like:

CREATE TABLE `stuff` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `details` varchar(255) DEFAULT 'boring',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=590 DEFAULT CHARSET=latin1;

Into one-liners like

CREATE TABLE `stuff` ( `id` int(11) NOT NULL AUTO_INCREMENT, `details` varchar(255) DEFAULT 'boring', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=590 DEFAULT CHARSET=latin1;

If you use vim, it’s easy: just search for tables with /CREATE TABLE then use J to join lines until you have the entire statement in one row.

Happy Slapsgiving

mysqlslap will create a new database (named mysqlslap) and set it up using the create.sql (from our backup). Then it will run the queries in slap.sql (collected from the general log) as many times as we dictate.

Run the whole shebang once:

mysqlslap --create create.sql --query slap.sql

Run create.sql once, then slap.sql 10 times (single-threaded, one after another). This gives us more test data and helps eliminate outliers.

mysqlslap --create create.sql --query slap.sql --iterations=10

Run create.sql once, then have 10 concurrent clients each run slap.sql 10 times (100 times total):

mysqlslap --create create.sql --query slap.sql --iterations=10 --concurrency=10

These numbers don’t get interesting until you have a change that lets you collect before and after slaps.

I’m considering turning on sync-binlog, which can prevent a master server crash from causing some transactions to never replicate out when the master comes back online–but it’s notorious for the performance penalty. So I’ll run a slap, then make the proposed change, the rerun the slap:

$ mysqlslap --create create.sql --query slap.sql --iterations=10 --concurrency=10
   Average number of seconds to run all queries: 3.043 seconds

$ mysql -u root -e"SET GLOBAL sync_binlog = 1;"

$ mysqlslap --create create.sql --query slap.sql --iterations=10 --concurrency=10
   Average number of seconds to run all queries: 5.156 seconds

This tells me that turning on sync-binlog is likely to reduce my performance by about 70% (average run jumped from 3 seconds to more than 5 seconds). I can combine that with other knowledge (from the humble top to sophisticated tools like vmstat and iostat) to see if my existing hardware can take that kind of performance hit, or if I need faster disk or some other preventative planning.

That outcome might be completely different for an application with a higher read/write ratio, or a database that fits entirely in memory, or with a bigger write cache on my RAID card. But, because I fed mysqlslap my own data, I have high confidence that I understand the consequences of the change I’m planning.