Sharing Responsibilities – JaRE installation

Hi All,

during PCM17, the Pentaho community meeting, I watched Uwe Geercken’s presentation about his Java Rule Engine (JaRE). The idea is to have business users maintain their own business rules and have the IT department only take care of IT logic. That should take load off IT’s shoulders and give power to the business users. As the business users are the business experts, data quality should improve as the rules are maintained directly by the business.

I am a Pentaho heavy user, so I will focus on the use with Pentaho data integration (PDI/Spoon/Kitchen). The rule engine can also compare the rules against an Apache NIFI stream, an Apache KAFKA stream and inside virtually any other Java application. JaRE is open source, available on Github, and can be used under Apache License 2.0

This blog post will cover the installation of the JaRE. In the coming days I will continue writing about maintaining the rules and about the PDI integration and usage.

JaRE Installation

I will install the JaRE on a clean UBUNTU 16.04.3 LTS virtual box. The OS is fully updated.

First of all we need to install the Tomcat Server, MariaDB-Server and GIT.

sudo apt-get install tomcat mariadb-server git

Next, we enter MariaDB

sudo mysql -uroot

and create the database and a maintaining user. Of course you should pick a more secure password. Just make sure to remember it. You’ll need it during the installation process.

CREATE DATABASE ruleengine_rules;
CREATE USER 'rule_maintainer'@'localhost' IDENTIFIED BY 'maintainer_password';
GRANT ALL PRIVILIGES ON ruleengine_rules.* TO 'rule_maintainer'@'localhost';
FLUSH PRIVILEGES;

Exit MariaDB with ctrl+c

Now we download the ruleengine maintenance sql file to our filesystem and load it into the database we have just created

git clone https://github.com/uwegeercken/rule_maintenance_db.git
sudo mysql ruleengine_rules -uroot < rule_maintenance_db/ruleengine_rules.sql

The database is installed now.

Next step is to download the ruleengine maintenance war file and copy it to the tomcat server

git clone https://github.com/uwegeercken/rule_maintenance_war.git
cp rule_maintenance_war/rule_maintenance.war /var/lib/tomcat8/webapps

Now you can access the rule maintenance tool at

http://localhost:8080/rule_maintenance

On first start, the engine asks for the MariaDB username/password provided earlier and a path where you want to save the rule files.

Click ‘save’ to check the database connection. Next click Login. The default username is “admin”, the password is also “admin”

This concludes the installation tutorial. The next blog entry will be about maintaining the rules.

Update 12/2017

The installation process has been simplified now. There is no need any more to run the SQL file.

 

 

PCM17 – the Pentaho Community Meeting 2017

Hi All,

this Saturday, the #PCM17 takes place in Mainz, Germany. PCM17 is the Pentaho Community Meeting that takes place at different locations around the globe. As it happens “around the corner” this time, I will be there and I am so excited. This is the 10th time it happens and there are so many interesting talks. As there are two different “Tracks” – Business and Technical, I will have a hard time deciding where to go – I will mostly stick to the technical track though.

There are talks about the separation of business- and IT rules in ETL Jobs, “Serverless” PDI and Machine Learning, a topic I am specifically interested in.

And – hey – CERN is talking and if there is anybody in the world that generates a lot of data it needs to handle, it’s CERN.

IT-Novum, who is organizer of the event, will do extensive blogging, so I will just lean back and enjoy the show – nothing to expect in my blog.

Follow me on Twitter for comments, impressions and pictures.

Cheers

Andre

Re-Post: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration

Increase MySQL output to 80K rows/second in Pentaho Data Integration

One of our clients has a MySQL table with around 40M records. To load the table it took around 2,5 hours. When i was watching the statistics of the transformation I noticed that the bottleneck was the write to the database. I was stuck at around 2000 rows/second. You can imagine that it will take a long time to write 40M records at that speed.
I was looking in what way I could improve the speed. There were a couple of options:
  1. Tune MySQL for better performance on Inserts
  2. Use the MySQL Bulk loader step in PDI
  3. Write SQL statements to file with PDI and  read them with mysql-binary

When i discussed this with one of my contacts of Basis06 they faced a similar issue a while ago. He mentioned that speed can be boosted by using some simple JDBC-connection setting.


useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true

[[UPDATE 10/2018: In some environments – especially with a high network load iseServerPrepStatements=true is worth a try]]

These options should be entered in PDI at the connection. Double click the connection go to Options and set these values.

Used together, useServerPrepStmts=false and rewriteBatchedStatements=true will “fake” batch inserts on the client. Specifically, the insert statements:


INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);

will be rewritten into:


INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3);

The third option useCompression=true compresses the traffic between the client and the MySQL server.

Finally I increased the number of copies of the output step to 2 so that there are two treads inserting into the database.

This all together increased the speed to around 84.000 rows a second! WOW!

 

Source: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration