PDA

View Full Version : ZooKeeper connection problem



Luke2
10-25-2013, 01:08 PM
Has anyone been able to use the PDI User Defined Java Class (UDJC) to connect to a ZooKeeper server?
(I apologize in advance for posting this in both the Big Data forum and the Pentaho Data Integration (Kettle) forum, I was not sure which one was better and my question is related to both.)

I am initially just trying to establish a connection, which is just one line of code:
connector = new ZooKeeperInstance(instanceName, zooKeeperUrl).getConnector(username, password.getBytes());

But this depends on the following libraries:
accumulo-core.jar
accumulo-fate.jar
accumulo-trace.jar
(... several commons-xxx.jar)
hadoop-core-1.2.1.jar
libthrift.jar
zookeeper-3.4.5.jar

But when I copy the accumulo-core.jar to the PDI lib directory
\pentaho\design-tools\data-integration\lib
and try to restart PDI, it crashes! I was unable to locate a log file that contained an error message.

I am using PDI 5.0, and ZooKeeper is running on a remote VM. I am able to connect to the ZooKeeper server using plain java from the same machine that is running PDI. PDI crashes when it tries to load the transform containing the UDJC-zooKeeper step.

I have also tried using PDI 4.4.0, and it also crashes the same way.

Any advice?

thank you

mattb_pdi
10-25-2013, 02:57 PM
You can try creating a directory containing those dependencies, then adding that directory to the front of the classpath specified in \pentaho\design-tools\data-integration\launcher\launcher.properties. However, the versions of commons-xxx and/or any other libraries that PDI also uses might cause load-time or run-time errors.

Luke2
10-25-2013, 04:22 PM
Thanks Matt,
Changing the launcher.properties did not seem to work for me.
I tried putting the jar files in a "libext" directory, which I created next to the data-integration\lib. (Note that this directory does not exist in PDI 5.0, but does in PDI 4.4.0.) I then added the path to launcher.properties, which is now:
classpath=../:../ui:../ui/images:../lib:../libext
With that change PDI would start, but my UDJC step was failing to load the new jar files, it seems it is not using the classpath from the launcher?

mattb_pdi
10-25-2013, 06:06 PM
Try putting libext\ at the front of the classpath:

classpath=/libext:../:../ui:../ui/images:../lib:..

and also at the front of the libraries property. That should force PDI to load your JARs first, but if the versions don't match what's in lib/, it could cause errors. In this situation it's probably better to create a step plugin, which uses a self-first classloader to load JARs in the plugin's lib/ folder first. If you don't want to write a whole step, you could copy out the UDJC package from the engine and ui modules into a new project, change the names slightly, and use an annotation on the Meta class to specify that it's a plugin.

Of course, there can still be classloader problems when plugins and PDI share classes, but usually you'd have better luck if all your functionality is insulated as a plugin, rather than using the UDJC step (which is part of the engine) and adding (potentially duplicate/conflicting) JARs.

Luke2
10-29-2013, 01:45 PM
Thanks Matt, I used your suggestion of using a step plugin, and finally got it to connect!
I started with the step plugin Demo (HelloWorld) example from
https://github.com/csutherl/pentaho-plugins/tree/master/kettle-sdk-step-plugin (https://github.com/csutherl/pentaho-plugins/tree/master/kettle-sdk-step-plugin)
I built the step using Eclipse, deployed it, and tested it.
Then I modified the build to also deploy a lib directory in the plugin step, and included all of the other jar files that the accumulo-zookeeper connection would need. I then added a call to ZooKeeperInstance.getConnector from the DemoStepMeta constructor, and it worked!

thank you!

mattb_pdi
10-29-2013, 02:40 PM
Awesome! If you have something you'd like to share with the Pentaho Data Integration community, you could always submit your plugin to the PDI Marketplace :) Instructions (and the marketplace manifest) are here:

https://github.com/pentaho/marketplace-metadata