Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Big Data features open-sourced?

  1. #1

    Default Big Data features open-sourced?

    I've been reading that pentaho open-sourced big data features. I checked out the Kettle project but I was unable to find the classes implementing Big Data steps and entries. Are they in some other place or they are actually not yet available.
    All other steps I found in org.pentaho.di.trans.steps and org.pentaho.di.job.entries.

    Please let me know if I'm looking at wrong direction or what is the problem.

    I also downloaded 4.3 kettle version from sourceforge but still no big data found.

    Last edited by kepha; 07-23-2012 at 08:53 PM.

  2. #2
    jdixon Guest


    The Big Data features are a plug-in to Kettle.

    Source code:

  3. #3


    ALright, but still I cannot locate the source files. They are not in the src-plugins. Maybe I'm looking at wrong places, again, please if you know let me know where can I find them.
    One more question. Are they extending and implementing the same classes and interfaces as all other steps/entries? If they follow the guidelines they should but just to check.

  4. #4


    Oh so sorry I totally missed the links you posted. But are they included actually inside the project?
    I mean: There is no release with them already included in the project? Or I should do it by myself?
    Last edited by kepha; 07-23-2012 at 09:29 PM.

  5. #5
    jdixon Guest


    The information you need is in the wiki, including links to download binary builds.

  6. #6
    Join Date
    Aug 2010


    You can find the code on Github:

  7. #7


    OK from everything I've read I got a bit confused. On the one side they always say that from Kettle 4.3, Pentaho open sources the Big Data functionalists but they are actually not in the Kettle project. They are in the plugin project you gave me a link to.
    Now, if I want to use them integrally I would need somehow to incorporate this plug in into the Kettle 4.3 project. (Please let me know if this does not make sense, I'm a bit rookie in all this.) Now there are some instruction about loading plugins to the kettle So I was wondering, should I follow these instructions and do it myself or there is actually a Kettle 4.3 release with the BigData plugin already incorporated?

    Also I found this in the readme.txt of the src-plugin package in kettle project:

    Core Kettle Plugin Documentation

    the following folders are considered core kettle plugins, plugins that are
    distributed with kettles core distribution but are useful to have as plugins for architectural and
    dependency reasons.

    to add a core plugin:

    - create a folder under src-plugins with the name of the plugin

    - create src, test, lib, and res subfolders for the various files that will be included in your plugin

    - add your plugin folder name to the plugins property in

    - if you would like your plugin's jar and zip to get published to artifactory, update the
    build-res/ with your plugin folder.

    An ivy.xml file must be located with in the plugin's root folder. When creating a new plugin
    the ivy.xml file from an existing plugin can be copied. No editing is needed.

    all core plugins get built as part of the core dist, also you can build the plugins standalone by using
    the "-standalone" ant targets related to the plugins. If you'd like to just build a single plugin,
    you can do that by overriding the plugins property to just reference your plugin.

    To have core plugins function in eclipse, you'll need to add the plugin's dependencies to your
    .classpath file and set the property -DKETTLE_PLUGIN_CLASSES to the full name of your plugin class names.

    Here is the current core kettle plugins eclipse flag:

    Would this be the way to go with this?
    Last edited by kepha; 07-24-2012 at 02:15 PM.

  8. #8
    Join Date
    Aug 2010


    Have you taken a look at the getting started page for Java Developers? That will show you how to get started with the Pentaho Big Data Plugin project with Eclipse.

    If you want to debug Kettle code and the Pentaho Big Data Plugin at the same time you should check out this introduction:

    If you want to be able to step through the code at runtime, outside of a unit test, I believe most of our developers use a working version of Kettle to deploy into (via the big data plugin build script: `ant install-plugin`). We execute Spoon with remote debugging enabled and connect up to it through our IDE to debug the code at runtime.

  9. #9


    Quote Originally Posted by jganoff View Post
    If you want to debug Kettle code and the Pentaho Big Data Plugin at the same time you should check out this introduction:
    This seems the thing I was searching for. Very useful link. Thanks very much!

  10. #10


    I've been following the guidelines for debugging plugin inside the eclipse

    I successfully imported all plugins with this enviromental variable option but now when I start for example pentaho MapReduce job I got null pointer exception and I debug it a little bit and it seems that the plugin directory for this job is set to null.
    Have I missed some step in the configuration?
    Where is this directory path is set actually?
    Is there some xml configuration file where this needs to be set?

    pluginFolder in PluginInterface instantiation is set to null.

    Last edited by kepha; 07-26-2012 at 02:05 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.