Re-Post: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration

Increase MySQL output to 80K rows/second in Pentaho Data Integration

One of our clients has a MySQL table with around 40M records. To load the table it took around 2,5 hours. When i was watching the statistics of the transformation I noticed that the bottleneck was the write to the database. I was stuck at around 2000 rows/second. You can imagine that it will take a long time to write 40M records at that speed.
I was looking in what way I could improve the speed. There were a couple of options:
  1. Tune MySQL for better performance on Inserts
  2. Use the MySQL Bulk loader step in PDI
  3. Write SQL statements to file with PDI and  read them with mysql-binary

When i discussed this with one of my contacts of Basis06 they faced a similar issue a while ago. He mentioned that speed can be boosted by using some simple JDBC-connection setting.


useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true

These options should be entered in PDI at the connection. Double click the connection go to Options and set these values.

Used together, useServerPrepStmts=false and rewriteBatchedStatements=true will “fake” batch inserts on the client. Specifically, the insert statements:


INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);

will be rewritten into:


INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3);

The third option useCompression=true compresses the traffic between the client and the MySQL server.

Finally I increased the number of copies of the output step to 2 so that there are two treads inserting into the database.

This all together increased the speed to around 84.000 rows a second! WOW!

 

Source: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration