Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Carte & Clustered Jobs

  1. #1

    Default Carte & Clustered Jobs

    Has anyone discovered more about the issue described here?

    I have encountered the same problem with pdi-ee-7.1. We developed a clustered transformation and, when submitting a parent Job through the Carte API (executeJob?job=/path/to/job) the sub-transformation would only execute on the master node (and not run locally).

    As a workaround, we were able to use kitchen.sh to submit our job to the cluster. It would work about 75% of the time, and the other 25% of the time it would not be submitted to the cluster at all but run locally and output logs directly to stdout. There was no discernable pattern to this, but luckily for our purposes we only needed a single clustered run.

    Note, to submit via kitchen.sh, the XML contained within out KJB file included these attributes for the specific clustered transformation:

    <cluster>Y</cluster>
    <slave_server_name>Carte Master 1 - Production (8080)</slave_server_name>

    Additionally the KJB contained the following run configuration JSON embedded at the bottom of the XML:

    <attribute>
    <key>Clustered</key>
    <value>{"children":[{"children":[],"id":"server","value":"Clustered"},{"children":[],"id":"clustered","value":"Y"},{"children":[],"id":"name","value":"Clustered"},{"children":[],"id":"description","value":null},{"children":[],"id":"readOnly","value":"N"},{"children":[],"id":"sendResources","value":"N"},{"children":[],"id":"logRemoteExecutionLocally","value":"N"},{"children":[],"id":"remote","value":"N"},{"children":[],"id":"local","value":"N"},{"children":[],"id":"showTransformations","value":"N"}],"id":"Clustered","value":null,"name":"Clustered","owner":null,"ownerPermissionsList":[]}</value>
    </attribute>

    Finally, our clustered transformation contained a cluster schema defining a dynamic cluster based on a single master node: the same "Carte Master 1 - Production (8080)" also defined in the parent job.

    This worked as a temporary bandaid, but the fact that a clustered transformation can't be submitted to a cluster through the Carte API sounds like a bug.

  2. #2

    Default

    Has anyone found out more about this issue?

    We faced the same problem for pdi-ee-7.1. We created a clustered transformation that was called from within a parent job. When submitting this job to our cluster via the Carte API (executeJob?job=/path/to/job.kjb), the transformation only ran on the master node.

    Luckily for our purposes we only needed a single, successfully clustered run, and so we were able to use the following workaround. We submitted our job to the carte cluster by using kitchen.sh (running on CentOS). This would work roughly 75% of the time, while 25% of the time it would simply run in local mode and output all the log info to stdout. There was no discernable pattern to this that we could find.

    It may be important to note that our KJB contained the following XML embedded for the particular job entry that called our clustered transformation:

    <entry>
    ...
    <cluster>Y</cluster>
    <slave_server_name>Carte Master 1 - Production (8080)</slave_server_name>
    ...
    </entry>


    and at the bottom of the KJB was the Run Configuration defined by a snippet of JSON like so:

    <attributes>
    ...
    <group>
    <name>{"_":"Embedded MetaStore Elements","namespace":"pentaho","type":"Default Run Configuration"}</name>
    <attribute>
    <key>Clustered</key>
    <value>{"children":[{"children":[],"id":"server","value":"Clustered"},{"children":[],"id":"clustered","value":"Y"},{"children":[],"id":"name","value":"Clustered"},{"children":[],"id":"description","value":null},{"children":[],"id":"readOnly","value":"N"},{"children":[],"id":"sendResources","value":"N"},{"children":[],"id":"logRemoteExecutionLocally","value":"N"},{"children":[],"id":"remote","value":"N"},{"children":[],"id":"local","value":"N"},{"children":[],"id":"showTransformations","value":"N"}],"id":"Clustered","value":null,"name":"Clustered","owner":null,"ownerPermissionsList":[]}</value>
    </attribute>
    </group>
    ...
    </attributes>


    I'm not exactly sure what options in the JSON are required for the sub-transformation to be submitted to the cluster. Finally, our clustered transformation itself had a cluster schema defined that pointed to the same 1 master node referenced above (Carte Master 1, same name and all), and the cluster was set as dynamic. The clustered steps were configured to use this cluster schema.

    Hopefully this workaround helps somebody else, but as noted it did not always work as expected when running with kitchen.sh and I do not yet know why. And being unable to submit a clustered transformation to your cluster via the Carte API sounds like a bug.
    Last edited by benjaminedwardwebb; 01-24-2018 at 04:26 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.