MySQL’s mysqlslap
utility helps you visualize how your database performance will improve with more hardware, new tuning, or different indexes.
The trouble is, it’s designed to use fake data. You can tune the fake data to look increasingly like your real data, and that will help you get a feel for changes like bigger hardware. But fake data teaches you almost nothing for deep changes like altering your schema, adding new indexes, or tweaking memory and cache parameters.
So we’re going to get completely authentic data from your database so you can plan and test performance changes with higher confidence.
This procedure is really ideal if you have a second server about the same size as the one in production. You absolutely do not want to run this on a production server; mysqlslap
will try to contain the changes to a throwaway database (named mysqlslap) but it isn’t very bright and bad things are incredibly likely to happen. Run this process on a lab server, or a cloud server, or be prepared to do a full data restore when you’re finished.
Backup your database.
1
|
|
Log into MySQL and turn on the innocently named “general log.” Don’t let the name fool you, this records every connect, disconnect, and query (write and read) occurring on the server. This is a bad idea if your server is anywhere near capacity, because it adds a disk hit to every single action on the server.
1 2 3 4 5 |
|
Make Things Happen
Now run some traffic through your system that you want to simulate. You could take a test account through a tour of your product’s important features, collect real traffic for a day, etc. When you’re done, turn the general log off:
1 2 |
|
Change the Log File to a Slap File
Copy the general log file into your home directory and change its permissions. (It was created with strong permissions since it could contain sensitive data.)
1 2 |
|
Now get these files ready for mysqlslap
. The general log file contains more than just queries, here’s a short excerpt from mine:
1 2 3 4 |
|
These four records show a user (really a PHP app) connecting, running two SELECT queries, and disconnecting. mysqlslap
only needs the queries, so we’ll create a new slap.sql with only Query rows, and without the log columns before the SQL statements.
1
|
|
We’re also going to re-execute these queries repeatedly, both in parallel (to load the server) and sequentially (to get multiple runs of the data for statistical significance, and to see “warm cache” performance). Repeating INSERTs in tables with unique primary keys is a problem, so we’ll replace all INSERT INTO with REPLACE INTO, which prevents key collisions (but can cause trouble with constraints or cascading deletes).
1
|
|
You might have noticed that the SQL statements in slap.sql don’t have the customary ; at the end. That’s actually OK, by default mysqlslap
considers each line to be a SQL statement, so slap.sql is fine. But you will need to make each of the CREATE TABLE statements in create.sql into one line. Open it in your favorite text editor and change all the statements like:
1 2 3 4 5 |
|
Into one-liners like
1
|
|
If you use vim, it’s easy: just search for tables with /CREATE TABLE
then use J
to join lines until you have the entire statement in one row.
Happy Slapsgiving
mysqlslap
will create a new database (named mysqlslap) and set it up using the create.sql (from our backup). Then it will run the queries in slap.sql (collected from the general log) as many times as we dictate.
Run the whole shebang once:
1
|
|
Run create.sql once, then slap.sql 10 times (single-threaded, one after another). This gives us more test data and helps eliminate outliers.
1
|
|
Run create.sql once, then have 10 concurrent clients each run slap.sql 10 times (100 times total):
1
|
|
These numbers don’t get interesting until you have a change that lets you collect before and after slaps.
I’m considering turning on sync-binlog
, which can prevent a master server crash from causing some transactions to never replicate out when the master comes back online–but it’s notorious for the performance penalty. So I’ll run a slap, then make the proposed change, the rerun the slap:
1 2 3 4 5 6 7 |
|
This tells me that turning on sync-binlog is likely to reduce my performance by about 70% (average run jumped from 3 seconds to more than 5 seconds). I can combine that with other knowledge (from the humble top to sophisticated tools like vmstat and iostat) to see if my existing hardware can take that kind of performance hit, or if I need faster disk or some other preventative planning.
That outcome might be completely different for an application with a higher read/write ratio, or a database that fits entirely in memory, or with a bigger write cache on my RAID card. But, because I fed mysqlslap
my own data, I have high confidence that I understand the consequences of the change I’m planning.