I am sure if you’ve googled your way to this article, you already know the problem but here you go..
The Problem
Our production database has some key tables. Business requirements have evolved and recently we needed to add a new column to a table with 50 million rows. We played around with other options, but since it was likely to become a column with many WHERE queries, we decided that we had to add the column.
Browntape is an eCommerce web app and that is a 24×7 business, so we can’t take much downtime. Few minutes is acceptable, but nothing in hours. Doing an ALTER will lock the table for a few hours – not acceptable.
On top of that, we are on Amazon RDS. That causes some more complications like no access to super user. MySql version 5.6 (don’t judge).
gh-ost and other options
One option was percona-online-schema-change
which is pretty popular to do this kind of stuff. It basically makes a ghost table (with new schema), points a bunch of triggers from the original table to the ghost table. Once the replication is done, you are set.
Another option is an online schema change by Facebook which seems to be popular.
However, we used gh-ost for the following reasons:
- Least learning curve – even though the documentation was not very beginner friendly, I was able to figure this out pretty quickly.
- binlog not triggers – gh-ost uses the binlog which is the basis of all Mysql replication. That seemed a lot more robust way of achieving this.
- Postpone cut over feature – when you are doing an alter on a key table on your production database, you are scared. You want control to check and double check everything. This feature lets you do that. More on that later.
Installing gh-ost on Ubuntu
Installing on Ubuntu was pretty straightforward. The github project has a link to a .deb file. Just grab the .deb file and install like this:
$wget https://github.com/github/gh-ost/releases/download/v1.1.0/gh-ost_1.1.0_amd64.deb $sudo dpkg -i gh-ost_1.1.0_amd64.deb
Changes on RDS to be ready for gh-ost
For gh-ost to work on RDS, you have to enable binlogs and set the binlog format to rows.
These can be found in the parameter groups in RDS. You can check both of these conditions with the following commands.
show global variables like 'log_bin'; show global variables like 'binlog_format';
Dry Run of the Migration (noop)
gh-ost provides a way to do a mock dry run of the migration without actually running the migration. This is called noop. If you run the command without –execute, then it does not actually migrate the data but gives you any redflags.
Understanding the Important Switches in the gh-ost command
The gh-ost command comes with many switches. Most are self explanatory, but some need documentation. I found the documentation of the project sparse in this aspect, so adding this for the Internet.
Here’s what the command looks like as per the documentation:
gh-ost \
--user="gh-ost" \
--password="123456" \
--host=master.with.rbr.com \
--database="my_schema" \
--table="my_table" \
--verbose \
--alter="ADD COLUMN random int(11) null DEFAULT null" \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas="myreplica.1.com,myreplica.2.com" \
--max-lag-millis=1500 \
--allow-on-master \
--switch-to-rbr \
--allow-master-master \
--cut-over=default \
--exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
[--execute]
Let me not insult your intelligence by explaining the first 7 switches.
Here are explanations to some switches that I found useful:
--allow-on-master
and--allow-master-master
was required for running the migration without a replica. Our Mysql setup did not have replication and I did not want to introduce it. So, was to use it in a mode where the ghost table is formed in the master database itself. The first switch allows that and the second explicitly permits master to master replication.--switch-to-rbr
will not work on RDS as you do not have super privileges while connecting from a remote host. I had to change the binlog format to rows in the parameter groups and use theswitch --assume-rbr
.--exact-row-count
enforces that gh-ost should count the exact number of rows to be migrated (probably by SELECT count(*) FROM table) instead of estimating it using an EXPLAIN query.--concurrent-rowcount
is less understood, but probably means that the estimates presented by the tool while copying the rows will be more accurate.
Here are a few lines of the command’s output to give you an idea.
Copy: 32185000/49903876 64.5%; Applied: 0; Backlog: 0/1000; Time: 2h10m30s(total), 2h10m30s(copy); streamer: mysql-bin-changelog.700402:12100895; Lag: 0.02s, State: migrating; ETA: 1h11m50s Copy: 32250000/49903876 64.6%; Applied: 0; Backlog: 0/1000; Time: 2h11m0s(total), 2h11m0s(copy); streamer: mysql-bin-changelog.700402:31779955; Lag: 0.02s, State: migrating; ETA: 1h11m42s Copy: 32327000/49903876 64.8%; Applied: 0; Backlog: 0/1000; Time: 2h11m30s(total), 2h11m30s(copy); streamer: mysql-bin-changelog.700402:55073246; Lag: 0.02s, State: migrating; ETA: 1h11m29s Copy: 32405000/49903876 64.9%; Applied: 0; Backlog: 0/1000; Time: 2h12m0s(total), 2h12m0s(copy); streamer: mysql-bin-changelog.700402:78400893; Lag: 0.01s, State: migrating; ETA: 1h11m16s Copy: 32480000/49903876 65.1%; Applied: 0; Backlog: 0/1000; Time: 2h12m30s(total), 2h12m30s(copy); streamer: mysql-bin-changelog.700402:101370328; Lag: 0.02s, State: migrating; ETA: 1h11m4s Copy: 32556000/49903876 65.2%; Applied: 0; Backlog: 0/1000; Time: 2h13m0s(total), 2h13m0s(copy); streamer: mysql-bin-changelog.700402:124312398; Lag: 0.02s, State: migrating; ETA: 1h10m52s
Being Extra Paranoid about my data
I understand that gh-ost is an established open source project with 8.8K stars. But, I am responsible for my customers’ data. I have to make sure things don’t go south. That brings me to my favourite switch in gh-ost.
--postpone-cut-over-flag-file
allows you to control when the actual cut over to the ghost table happens.
Let me first explain the concept of “cut over”, by chronology:
- gh-ost creates a ghost table of your original table without any data. This is called _original_table_gho
- Applies the ALTER to this table (which is quick)
- Then, uses the replication system native to MySql, ie. binlog, to populate this ghost table with all the records. It shows your the progress of that.
- Once done, it can do the “cut over” where the ghost table is renamed with the name of the original table. The old table is now renamed _original_table_del which you can drop later.
Now, this step of cut over was very important for us.
- We had to take a short down time during this, just to make sure that no new records cross with the cut over is happening.
- We wanted to be super-sure of zero data loss at least of the most recent data.
So, the --postpone-cut-over-flag-file
creates as file (as specified in the switch) when the gh-ost command is run. Now, once gh-ost copies all the data to the ghost table, it will show you the status in the output. You can then satisfy your paranoia about your data. When sure, just delete the file mentioned. gh-ost will run the cut over for you.
Zero Data Loss Check
Here’s how I quickly made sure during the downtime that all my 90 day data was safe in the ghost table. Might be a bit excessive, but I did a row by row check.
Basically, wrote a query which created an MD5 hash of the CONCAT of the few most important columns in my table. If the STRCMP of both these hashed was 0, then my data was safe. Here’s my query:
select o.id,
STRCMP( MD5(CONCAT(o.id,o.title, o.created, o.company_id, o.user_id, o.channel_id, o.warehouse_id, o.order_status)) ,
MD5(CONCAT(g.id,g.title, g.created, g.company_id, g.user_id, g.channel_id, g.warehouse_id, g.order_status)) ) as cmp
from original_table o, _original_table_gho g
WHERE o.id=g.id
AND o.created > '2021-01-01'
HAVING cmp=1
order by o.id desc limit 1000;
So that was our journey of migrating this large table.
It took 3.5 hours to migrate the table with about 50 million rows.
With only a 15 min downtime!