Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Parallelizing looped job step?

  1. #1
    Join Date
    Apr 2014

    Default Parallelizing looped job step?


    I'm currently looking for a way to parallelize a looped job step.

    Name:  syncInit.jpg
Views: 101
Size:  11.4 KB
    - Run xform to find .ktr files in target directory
    - For every result, run second job

    Name:  syncLoop.jpg
Views: 96
Size:  13.3 KB
    - Run xform to set variables
    - Run (abstract) xform with passed variable as filename

    Graceful failure steps with email alerts are in place, along with other housekeeping items.

    I'm looking for a way to distribute the results from "get Files to execute" to the "Loop Through Transformations" step. Is that possible?

    PDI 5.0.1-stable
    Windows 7 (designing / template abstraction)
    Red Hat Enterprise Linux 6.2 (execution)
    MySQL 5.6

  2. #2
    Join Date
    Apr 2014


    Alright, as a follow-up, here is my current solution:

    initialSync.kjb has been changed to remove the "Loop Through Transformations" step
    getFilesToExecute.ktr has been changed to reflect the following setup: Name:  job_distribution.jpg
Views: 90
Size:  28.1 KB
    These Job Executor steps all point to the loopThroughFiles.kjb file.

    Note: Like all Pentaho distributions, this is a round-robin distribution, not a load-based distribution. In my most recent use case, each of these Job Executor steps handled 9 results, regardless of how long each one took to execute. The rowset is 2 for this transformation.

    This decreases total execution time by around 40% for me. (Would be more if my individual transformation time were more similar)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.