Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Avoiding Duplicate Processing in Multithreaded/Clustered step

  1. #1
    Join Date
    Apr 2008

    Default Avoiding Duplicate Processing in Multithreaded/Clustered step

    Hi -

    I'm trying to write a custom transformation plugin for Pentaho. In an effort to not do too much duplicate processing, I was wondering if there was a method I could call inside my that would tell me the total number of copies of a step that are running.

    I believe that copyNr (which is passed into the public object in the main class) is the number of the copy of that particular step over the entire cluster, and is not unique for the entire transformation.

    My codebase currently is at 3.0.4 but we will be moving to 3.1 when it becomes GA. I know there is a multithreaded csv file reader in there where that must have solved a similar dilemma, so if there was some change to the base code I'll need that isn't available until 3.1 to make this work, let me know.



  2. #2
    Join Date
    May 2006


    Idiot spammers

  3. #3
    Join Date
    Nov 1999


    You are right keithpsu, we already solved a similar problem.
    While the step copy number is unique in a single transformation, it is not unique across a clustered transformation.
    As such, we added 2 methods in BaseStep :

    public int getUniqueStepNrAcrossSlaves();
    public int getUniqueStepCountAcrossSlaves();

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.