Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Strategy used to distribute rows when using multiple copies of a step

  1. #1
    Join Date
    Jun 2008
    Posts
    21

    Default Strategy used to distribute rows when using multiple copies of a step

    Hi,

    Kettle allows setting the number of copies of a given step in order to process many rows in parallel. As far as I understand the distribution of rows is based on round-robin (see http://diethardsteiner.blogspot.fr/2...iple-step.html): each copy of the step will treat the same number of rows.
    In case the time needed to treat a row is not the same for each row, this strategy is not optimal: basically, it could be that one instance of the step will be done while other instances are still working. Would it be possible to have a mechanism where each copy shares the same queue so that every time a step is done with processing the current row, it can pick-up the next one in a shared queue? This would I think slightly improve the performances in some cases.

    Regards,
    Nicolas

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    For your use-case, PDI 5 enterprise edition has a "Load Balance" option.
    It does not have a single queue but it looks at all output queues prior to selecting which one to write to. It could be interesting to do as you suggest on the first row that passes: replace the N output rowsets with a single one but I haven't tried that yet.

    Since the distribution method is now a plugin (all EE features are plugins) we also describe an "Overflow" row distribution method: http://wiki.pentaho.com/display/EAI/...in+Development

  3. #3
    Join Date
    Jun 2008
    Posts
    21

    Default

    Thanks Matt.

    Regards,
    Nicolas

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.