Splitting Huge MySQL dump for easy restore

The open source RDBMS MySQL provides a great tool mysqdump to dump database.  Its the official sql dump utitlity for MySQL database. It makes the life of dba so easy that he can backup and restore the database within just two commands.  But sometimes due to the lacking of the infrastructure you can not dump and restore that easily. Specially when you are dealing with huge amount of data. Our database grows over time. Few hundred GBs are quite common. But if you run a software for long it might get in to Tera byte range. The problem starts when size is this huge.

Backing up and restoring such huge database is not easy.  It takes a lot of time to both backup and restore. Not to mention, if something breaks up the operation the output will most time corrupted. For backup operation the output  is a dump file. This will be corrupted if backup is interrupted. Just think if it breaks at 99%. It feels like pulling your hair! Think if it was a restore operation. Your database will be corrupted. Usually we do restore on production server. If the restore fails, that’ll be a disaster.

To come round this, Its better to backup and restore part by part. This way your database will less likely be corrupted. Here is some ways to achieve this,

  1. Split the database by tables. When backing up, split the backup operation by tables.  Just do some balance on the table size and then backup on each table. For example, If you have 3 tables 10 GB each and 2 tables 30 GB each then you can backup tables in 3 groups.
    • Group 1. All the 3 tables 10 GBs each totaling 30 GB
      • mysqldump  database1 table1 table2 table3 > table1-3.sql
    • Group 2. One of the two 30 GB tables.
      • mysqldump  database1 table4 > table4.sql
    • Group 3. One of the two 30 GB tables.
      • mysqldump  database1 table5 > table5.sql
  2. Split the big tables by rows. Not all tables in the database are big. Rather we always have some tables which are big enough to cause headache for the DBA. Think about the tweeter. Assuming a simple database schema, we can say all tables cumulatively not bigger than the table that holds the tweets. In our previous example table4 and table5 is bigger.  For such big tables we can split the dump by rows. mysqldump provides a great option called “–where”. By using this option you can split your table on any condition which is supported by “where” sql clause. So the number of ways you can split the table is infinite. Here is some strategies,
    1. auto_increment. If your table has auto column  you can split by any number of chunks.  See the examples.
      1. Split the odd and even rows
        mysqldump --where "id%2=0" database1 table4 > table4_even.sql
        mysqldump --where "id%2=1" database1 table4 > table4_odd.sql
      2. Using any number
        mysqldump --where "id%4=0" database1 table5 > table5_0.sql
        mysqldump --where "id%4=1" database1 table5 > table5_1.sql
        mysqldump --where "id%4=2" database1 table5 > table5_2.sql
        mysqldump --where "id%4=3" database1 table5 > table5_3.sql
    2. limit clause. Even you are using where clause, with a little hack you can use limit clause. This allows true partition of the tables.  For example if you have 18000 rows in a table you can dump in two parts like this,
      mysqldump --where "1 LIMIT 0, 10000" database1 table5 > table5_part_1.sql
      mysqldump --where "1 LIMIT 10000, 10000" database1 table5 > table5_part_2.sql
    3. Others. As the –where switch allows any sql condition you can use any criteria. You can split a user table by sex (male and female), by age group, by any enum column you are using, by date span etc. Some examples,
      mysqldump --where "sex='M'" database1 user > user_m.sql
      mysqldump --where "sex='F'" database1 user > user_f.sql
      mysqldump --where "account='ACTIVE'" database1 user > user_active.sql
      mysqldump --where "account='INACTIVE'" database1 user > user_inactive.sql
      mysqldump --where "year(date) <= 2008" database1 payment > payment_prior_2009.sql
      mysqldump --where "year(date) > 2008" database1 payment > payment_2009_and_after.sql