Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Pentaho MapReduce job for Cloudera 5.2 (YARN2)

  1. #1
    Join Date
    Oct 2009
    Posts
    13

    Default Pentaho MapReduce job for Cloudera 5.2 (YARN2)

    I'm using PDI 5.2 community edition (Kettle).

    I'm also running a three node cluster through CDH5.2 (Cloudera 5.2). I don't see shims for CDH52 so am using the shims for CDH51 instead. HDFS connectivity from Kettle works fine, which made me very much excited.

    I was trying a sample practice in http://wiki.pentaho.com/display/BAD/...regate+Dataset.

    Following the practice, I'm trying to connect to the cluster through the "Pentaho MapReduce" job. However, when I see the "Cluster" tab of the "Pentaho MapReduce" job, I see the settings for JobTrackers instead of settings for ResourceManager.

    I checked the instruction in the wiki site (http://wiki.pentaho.com/display/BAD/...for+YARN+Shims), but honestly, I am not following what it is saying since I am a beginner in this area.

    Do you have any idea what I am possibly missing?

    Thanks,
    Daniel

  2. #2
    Join Date
    Sep 2012
    Posts
    71

    Default

    The UI has not been updated for MapReduce v2 / YARN, hence the references to JobTracker, NameNode, etc. Most of the settings you'll need for CDH 5.x are in the shim (the *-site.xml files). Set the hostname properties to match your cluster (they default to the Cloudera QuickStart VM), and in the Pentaho MapReduce dialog, whenever you see a reference to a NameNode or JobTracker hostname, use the ResourceManager hostname. If it refers to an HDFS location, use your ResourceManager's hostname in the HDFS URL.

  3. #3
    Join Date
    Oct 2009
    Posts
    13

    Default

    Thank you for the help. It worked as I followed your guidance.
    For those who may be scratching their head as I did, what I did was to put the correct hostname in the YARN shims (plugins/pentaho-big-data-plugin/hadoop-configurations/cdh51/yarn-site.xml file) as below:

    <name>yarn.resourcemanager.hostname</name>
    <value>n1.example.com</value>

    And then from the Cluster tab of the Pentaho MapReduce dialog, I've put the hostname, n1.example.com, for the Job Tracker Hostname (which is actually a hostname for the ResourceManager) and 8032 for the Job Tracker Port (which is actually the port for CDH5.2 ResourceManager).


    Daniel

  4. #4
    Join Date
    Jan 2015
    Posts
    4

    Talking Pentaho MapReduce job for Apache hadoop2.2.0

    Quote Originally Posted by interlee View Post
    Thank you for the help. It worked as I followed your guidance.
    For those who may be scratching their head as I did, what I did was to put the correct hostname in the YARN shims (plugins/pentaho-big-data-plugin/hadoop-configurations/cdh51/yarn-site.xml file) as below:

    <name>yarn.resourcemanager.hostname</name>
    <value>n1.example.com</value>

    And then from the Cluster tab of the Pentaho MapReduce dialog, I've put the hostname, n1.example.com, for the Job Tracker Hostname (which is actually a hostname for the ResourceManager) and 8032 for the Job Tracker Port (which is actually the port for CDH5.2 ResourceManager).


    Daniel
    Hi,
    Recently i am using pentaho kettle 5.2.0, mostly i used it with apache community hadoop2.2.0, not the business version(cloudera or hortornworks). But when i do it as the wiki said :
    http://wiki.pentaho.com/display/BAD/...regate+Dataset
    It comes out an error. After i run the job "aggregate_mr.kjb", the libraries in the plugins/pentaho-big-data-plugin/hadoop-configurations/hadoop-2.2.0/lib will be put into the HDFS(/opt/pentaho/mapreduce/5.2.0.0-5.2.0.0-209-hadoop-2.2.0). The error is the "/opt/pentaho/mapreduce/5.2.0.0-5.2.0.0-209-hadoop-2.2.0/lib/zookkeeper-3.5.3.jar ... is not a valid dfs filename" . Do you know why?
    Or do you knnow whether pentaho kettle can work well with apache community hadoop2.x ?
    Thank you very much!
    Name:  FB)K](2ME2TV{ACTVK440TW.jpg
Views: 133
Size:  7.8 KB

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.