Sharing Responsibilities – JaRE installation

Hi All,

during PCM17, the Pentaho community meeting, I watched Uwe Geercken’s presentation about his Java Rule Engine (JaRE). The idea is to have business users maintain their own business rules and have the IT department only take care of IT logic. That should take load off IT’s shoulders and give power to the business users. As the business users are the business experts, data quality should improve as the rules are maintained directly by the business.

I am a Pentaho heavy user, so I will focus on the use with Pentaho data integration (PDI/Spoon/Kitchen). The rule engine can also compare the rules against an Apache NIFI stream, an Apache KAFKA stream and inside virtually any other Java application. JaRE is open source, available on Github, and can be used under Apache License 2.0

This blog post will cover the installation of the JaRE. In the coming days I will continue writing about maintaining the rules and about the PDI integration and usage.

JaRE Installation

I will install the JaRE on a clean UBUNTU 16.04.3 LTS virtual box. The OS is fully updated.

First of all we need to install the Tomcat Server, MariaDB-Server and GIT.

sudo apt-get install tomcat mariadb-server git

Next, we enter MariaDB

sudo mysql -uroot

and create the database and a maintaining user. Of course you should pick a more secure password. Just make sure to remember it. You’ll need it during the installation process.

CREATE DATABASE ruleengine_rules;
CREATE USER 'rule_maintainer'@'localhost' IDENTIFIED BY 'maintainer_password';
GRANT ALL PRIVILIGES ON ruleengine_rules.* TO 'rule_maintainer'@'localhost';
FLUSH PRIVILEGES;

Exit MariaDB with ctrl+c

Now we download the ruleengine maintenance sql file to our filesystem and load it into the database we have just created

git clone https://github.com/uwegeercken/rule_maintenance_db.git
sudo mysql ruleengine_rules -uroot < rule_maintenance_db/ruleengine_rules.sql

The database is installed now.

Next step is to download the ruleengine maintenance war file and copy it to the tomcat server

git clone https://github.com/uwegeercken/rule_maintenance_war.git
cp rule_maintenance_war/rule_maintenance.war /var/lib/tomcat8/webapps

Now you can access the rule maintenance tool at

http://localhost:8080/rule_maintenance

On first start, the engine asks for the MariaDB username/password provided earlier and a path where you want to save the rule files.

Click ‘save’ to check the database connection. Next click Login. The default username is “admin”, the password is also “admin”

This concludes the installation tutorial. The next blog entry will be about maintaining the rules.

Update 12/2017

The installation process has been simplified now. There is no need any more to run the SQL file.

 

 

Pentaho is now Hitachi Vantara…

..but according to Pedro Alves, our community superhero, Pentaho CE will stay. A couple of links:

http://fortune.com/2017/09/19/hitachi-vantara-data-systems-pentaho/

https://pedroalves-bi.blogspot.de/2017/09/hello-hitachi-vantara.html

https://globenewswire.com/news-release/2017/09/19/1124774/0/en/Hitachi-Introduces-Hitachi-Vantara-A-New-Digital-Company-Committed-to-Solving-the-World-s-Toughest-Business-and-Societal-Challenges.html#.WcFayuVTg8U.linkedin

They will probably tell us more at the Pentaho Community Meetup (#pcm17).

Counting NULL values in Oracle

Howdy,
today I had the challenge to count null values in a orale table. At first I tried something like

SELECT
sum(field1 is null) null_counter
FROM
table

but this did not bring be very far.

also:

SELECT
sum(
case field1 
when null then 1
else 0
end
) null_counter
FROM
table

did not get me very far.

After some internet research I came across this neat little thingie:

SELECT
sum(
case nvl(field1,'null') 
when 'null' then 1 
else 0 
end
) null_counter
FROM
table

Re-Post: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration

Increase MySQL output to 80K rows/second in Pentaho Data Integration

One of our clients has a MySQL table with around 40M records. To load the table it took around 2,5 hours. When i was watching the statistics of the transformation I noticed that the bottleneck was the write to the database. I was stuck at around 2000 rows/second. You can imagine that it will take a long time to write 40M records at that speed.
I was looking in what way I could improve the speed. There were a couple of options:
  1. Tune MySQL for better performance on Inserts
  2. Use the MySQL Bulk loader step in PDI
  3. Write SQL statements to file with PDI and  read them with mysql-binary

When i discussed this with one of my contacts of Basis06 they faced a similar issue a while ago. He mentioned that speed can be boosted by using some simple JDBC-connection setting.


useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true

[[UPDATE 10/2018: In some environments – especially with a high network load iseServerPrepStatements=true is worth a try]]

These options should be entered in PDI at the connection. Double click the connection go to Options and set these values.

Used together, useServerPrepStmts=false and rewriteBatchedStatements=true will “fake” batch inserts on the client. Specifically, the insert statements:


INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);

will be rewritten into:


INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3);

The third option useCompression=true compresses the traffic between the client and the MySQL server.

Finally I increased the number of copies of the output step to 2 so that there are two treads inserting into the database.

This all together increased the speed to around 84.000 rows a second! WOW!

 

Source: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration

Oracle Date Territory

Hi Folks,

I came across the problem that when using something like:

to_char(my_datefield,'D') as dow

to find out the date, this might behave differently on the pentaho production-server, the report designer and (if applicable) an underlying PDI transformation. When connecting to the Oracle-Server, you can click on “advanced” and set the locale to what you need – for example

ALTER SESSION SET NLS_TERRITORY = BELGIUM;

That way, your transformation/report will behave consistently across servers/environment.

 

Cheers

Andre

Handy date calculations in MySQL

Howdy,

I have been struggeling with date/time calculations for the last couple of years and meanwhile I have quite a collection I would like to share. Note that I have avoided something like date_format(current_date,’%y-%m-01′) because I dont find that very elegant

Simple date calculations

Today

SELECT current_date

Tomorrow

SELECT current_date + interval 1 day

Yesterday (you might guess….)

SELECT current_date - interval 1 day

A week ago

SELECT current_date - interval 1 week

Rather complex date calculations

The first day of last month

SELECT last_day(current_date - interval 2 month) + interval 1 day

The last day of last month

SELECT last_day(current_date - interval 1 month)

The last day of last year

SELECT current_date - INTERVAL DAYOFYEAR(current_date) DAY

the first day of this year

SELECT current_date - INTERVAL DAYOFYEAR(current_date)-1 DAY

last monday

SELECT current_date - INTERVAL weekday(current_date) day

 

If you have more to add, please feel free to put them into the comments and I will happily share them here.

 

Cheers

Andre