Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Is it possible parallel load data into one table using Hadoop on PDI?

  1. #1
    Join Date
    Dec 2010
    Posts
    10

    Default Is it possible parallel load data into one table using Hadoop on PDI?

    Now the PDI on hadoop only output the data into HDFS by using Dummy construct.
    Instead of saving the data into the HDFS, is it possible to load data into the table in the database directly? e.g., parallel loading data into the same fact table using hadoop. Thanks!
    Last edited by afancy; 01-02-2011 at 12:06 PM.

  2. #2
    Join Date
    Aug 2010
    Posts
    87

    Default

    That's entirely possible by using the Table Output Step as you would in any other transformation. The Output Steps defined in the Hadoop Transformation Job Executor step are still required to designate the step's output that should be passed as output from the Mapper or Reducer but you can do (almost) anything you want in the transformation.

  3. #3
    Join Date
    Dec 2010
    Posts
    10

    Default

    Thanks!

    But I found a problem running the transformation on Hadoop. The transformation construct cannot access database when running on the hadoop. For example, when i get the sequence from database, it always throws Exception. Could you advise? thanks

    java.io.IOException: org.pentaho.di.core.exception.KettleException:
    We failed to initialize at least one step. Execution can not begin!


    at org.pentaho.hadoop.mapreduce.GenericTransMap.map(SourceFile:188)
    at org.pentaho.hadoop.mapreduce.GenericTransMap.map(SourceFile:22)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: org.pentaho.di.core.exception.KettleException:
    We failed to initialize at least one step. Execution can not begin!


    at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:740)
    at org.pentaho.hadoop.mapreduce.GenericTransMap.map(SourceFile:39)
    ... 5 more
    Last edited by afancy; 01-05-2011 at 09:30 AM.

  4. #4
    Join Date
    Aug 2010
    Posts
    87

    Default

    Do you have the required database driver and accompanying jars required to access your database in the $HADOOP_HOME/lib directory of each node?

  5. #5
    Join Date
    Dec 2010
    Posts
    10

    Default

    Hi,
    About the previous exception, after I copy postgresql jdbc jar into hadoop/lib, the problem solved.

    Now i meets a ton of problems when I am using Pentaho hadoop to load data into the database. One of them is described as follows:

    I have implemented a transformation to insert data into the database, and this transformation is used as a mapper for PDI hadoop, but I found that for every row the database is opened and closed. (see details at http://dpaste.de/qJ3R/)

    PDI hadoop does not provide for the settings of configure, and close of Map and Reducer, which executes only once before and after map and reduce. So, i doubt that the establishing database connection was put into the map and reduce function such that for every row the connection is connected and closed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.