Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Hive & Hadoop

  1. #1
    Matt Casters Guest

    Default Hive & Hadoop

    Dear Kettle developers,

    Pentaho has been working hard to improve Hadoop and Hive integration with
    PDI/Kettle and the rest of the Pentaho stack.

    At the moment there are already things for you to play with.
    First of all there is a new 0.50 release of the Hive JDBC driver.

    CI builds:

    This should give you improved integration with PDI and the rest of the
    Pentaho stack.

    Then there are a few PDI plugins, steps like "Hadoop File Input" and "Hadoop
    File Output", job entries like "Hadoop Copy Files" and "Hadoop Job
    Executor". Source code is to be found over here:
    To make a lot of the new functionality work properly a new Apache VFS
    hdfs:// filesystem was created. This driver creates an abstraction layer
    for HDFS to make files on HDFS accessible like any other file in Kettle.
    The source code is over here: The
    driver is loaded automatically by a Spoon plugin (see the plugins above) if
    you want to try it out.

    Besides these core components there are also a few efforts that are taking
    place to run Map and Reduce transformations natively inside Hadoop. Even
    though this is functional it was decided to not release the source code for
    this at this time.

    I hope this gives you an idea of the things that are going on in this
    department. My involvement with these efforts has been limited. That
    being said, the involved people are on the mailing list if you have specific
    questions or concerns.

    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration
    Pentaho : The Commercial Open Source Alternative for Business Intelligence

    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at

  2. #2
    Jordan Ganoff Guest

    Default Hive & Hadoop Published Artifacts

    Kettle Developers,

    As a byproduct of building the Apache Hive JDBC Driver we've published
    the jars and required dependencies into the Pentaho repository located

    To use from Ivy, declare the following dependency. The transitive
    dependencies will pull in all required jars.
    <dependency org="org.apache.hadoop.hive" name="hive-jdbc"
    rev="0.5.0-pentaho-SNAPSHOT" changing="true"/>
    (Changing is declared here because the artifact is still a SNAPSHOT
    build and may change)

    To use from maven:

    You must also add the repository information to either the pom.xml or
    your local settings:
    <name>Pentaho External Repository</name>

    For your convenience, if you're not using a dependency manager, the
    required dependencies are now published artifacts from the
    apache-hive-0.5.0 job:


    Jordan Ganoff
    Software Engineer

    The Commercial Open Source Alternative for Business Intelligence
    5950 Hazeltine National Drive, Suite 340

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.