View Full Version : ZooKeeper connection problem

10-25-2013, 01:08 PM
Has anyone been able to use the PDI User Defined Java Class (UDJC) to connect to a ZooKeeper server?
(I apologize in advance for posting this in both the Big Data forum and the Pentaho Data Integration (Kettle) forum, I was not sure which one was better and my question is related to both.)

I am initially just trying to establish a connection, which is just one line of code:
connector = new ZooKeeperInstance(instanceName, zooKeeperUrl).getConnector(username, password.getBytes());

But this depends on the following libraries:
(... several commons-xxx.jar)

But when I copy the accumulo-core.jar to the PDI lib directory
and try to restart PDI, it crashes! I was unable to locate a log file that contained an error message.

I am using PDI 5.0, and ZooKeeper is running on a remote VM. I am able to connect to the ZooKeeper server using plain java from the same machine that is running PDI. PDI crashes when it tries to load the transform containing the UDJC-zooKeeper step.

I have also tried using PDI 4.4.0, and it also crashes the same way.

Any advice?

thank you

10-25-2013, 02:57 PM
You can try creating a directory containing those dependencies, then adding that directory to the front of the classpath specified in \pentaho\design-tools\data-integration\launcher\launcher.properties. However, the versions of commons-xxx and/or any other libraries that PDI also uses might cause load-time or run-time errors.

10-25-2013, 04:22 PM
Thanks Matt,
Changing the launcher.properties did not seem to work for me.
I tried putting the jar files in a "libext" directory, which I created next to the data-integration\lib. (Note that this directory does not exist in PDI 5.0, but does in PDI 4.4.0.) I then added the path to launcher.properties, which is now:
With that change PDI would start, but my UDJC step was failing to load the new jar files, it seems it is not using the classpath from the launcher?

10-25-2013, 06:06 PM
Try putting libext\ at the front of the classpath:


and also at the front of the libraries property. That should force PDI to load your JARs first, but if the versions don't match what's in lib/, it could cause errors. In this situation it's probably better to create a step plugin, which uses a self-first classloader to load JARs in the plugin's lib/ folder first. If you don't want to write a whole step, you could copy out the UDJC package from the engine and ui modules into a new project, change the names slightly, and use an annotation on the Meta class to specify that it's a plugin.

Of course, there can still be classloader problems when plugins and PDI share classes, but usually you'd have better luck if all your functionality is insulated as a plugin, rather than using the UDJC step (which is part of the engine) and adding (potentially duplicate/conflicting) JARs.

10-29-2013, 01:45 PM
Thanks Matt, I used your suggestion of using a step plugin, and finally got it to connect!
I started with the step plugin Demo (HelloWorld) example from
https://github.com/csutherl/pentaho-plugins/tree/master/kettle-sdk-step-plugin (https://github.com/csutherl/pentaho-plugins/tree/master/kettle-sdk-step-plugin)
I built the step using Eclipse, deployed it, and tested it.
Then I modified the build to also deploy a lib directory in the plugin step, and included all of the other jar files that the accumulo-zookeeper connection would need. I then added a call to ZooKeeperInstance.getConnector from the DemoStepMeta constructor, and it worked!

thank you!

10-29-2013, 02:40 PM
Awesome! If you have something you'd like to share with the Pentaho Data Integration community, you could always submit your plugin to the PDI Marketplace :) Instructions (and the marketplace manifest) are here: