Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Avoiding Duplicate Processing in Multithreaded/Clustered step

  1. #1
    Join Date
    Apr 2008
    Posts
    15

    Default Avoiding Duplicate Processing in Multithreaded/Clustered step

    Hi -

    I'm trying to write a custom transformation plugin for Pentaho. In an effort to not do too much duplicate processing, I was wondering if there was a method I could call inside my step.java that would tell me the total number of copies of a step that are running.

    I believe that copyNr (which is passed into the public object in the main class) is the number of the copy of that particular step over the entire cluster, and is not unique for the entire transformation.

    My codebase currently is at 3.0.4 but we will be moving to 3.1 when it becomes GA. I know there is a multithreaded csv file reader in there where that must have solved a similar dilemma, so if there was some change to the base code I'll need that isn't available until 3.1 to make this work, let me know.

    Thanks,

    Keith

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Idiot spammers

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You are right keithpsu, we already solved a similar problem.
    While the step copy number is unique in a single transformation, it is not unique across a clustered transformation.
    As such, we added 2 methods in BaseStep :

    Code:
    public int getUniqueStepNrAcrossSlaves();
    public int getUniqueStepCountAcrossSlaves();
    HTH,
    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.