måndag 2 mars 2015

Quick bulk load of data into InnoDB

Some weeks back I helped a customer lower time to bulk-load data into MySQL, they where at the time using a MySQL dumpfile (containing SQL statements) to populate their tables during nightly jobs.

By using LOAD DATA INFILE command and creating secondary indexes after bulk-load of data load time went down by a factor of almost 2x.

My test environment:
DB: MySQL 5.6.23
OS: Ubuntu 14.04
HW: My Toshiba Portege laptop with 2 cores and SSD disk

Commands/tables used in tests:
CREATE TABLE t1PK (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32)) ENGINE=InnoDB;
CREATE TABLE t1 (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32) UNIQUE KEY) ENGINE=InnoDB;
LOAD DATA INFILE '/home/ted/labb/load-data/1000000' INTO TABLE test.t1 FIELDS TERMINATED BY ',';

I created a file named 1000000 that contains two columns and 1 millon rows:
$ head -5 1000000
0,test0
1,test1
2,test2
3,test3
4,test4
.....

$ wc -l 1000000
1000000 1000000

Next step is to run some test, if you do not have a BBWC on your disksystem you might consider setting innodb_flush_log_at_trx_commit to 2 during bulk-load of data.

First let's create the full table and import the 1000000 file to get a baseline:
mysql> set global innodb_flush_log_at_trx_commit=2;
mysql> CREATE TABLE t1 (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32) UNIQUE KEY) ENGINE=InnoDB;
mysql> LOAD DATA INFILE '/home/ted/labb/load-data/1000000' INTO TABLE test.t1 FIELDS TERMINATED BY ',';
mysql> set global innodb_flush_log_at_trx_commit=1;

Result: Loaded 1.000.000 rows in 4.3 seconds (average from 10 runs)

Next let's try to load the file into table with only PK and then add index afterwards:
mysql> set global innodb_flush_log_at_trx_commit=2;
mysql> CREATE TABLE t1PK (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32)) ENGINE=InnoDB;
mysql> LOAD DATA INFILE '/home/ted/labb/load-data/1000000' INTO TABLE test.t1PK FIELDS TERMINATED BY ',';
mysql> ALTER TABLE t1PK ADD INDEX(art);
mysql> set global innodb_flush_log_at_trx_commit=1;
Result: Loaded 1.000.000 rows in 2.1 seconds (average from 10 runs) and another 1.9 seconds to add secondary index, total time 4 seconds in average. This improvement is due to InnoDB Fast Index Creation

Conclusion:
In my environment it took almost 6 seconds to load a dumpfile (using innodb_flush_log_at_trx_commit=2) of table t1 into mysql, this will depend on how many keys you have and many other factors but as you can see using LOAD DATA INFILE command is a fast way to load data into MySQL!

Inga kommentarer:

Skicka en kommentar