Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Data movement - please explain.

  1. #1
    Join Date
    Jun 2016
    Posts
    181

    Default Data movement - please explain.

    From the tutorial:



    • Distribute rows of data in a round-robin fashion: each target step (copy) gets a row in turn, all target steps get equal amounts of rows
    • Copy rows: all target step (copies) receive all rows.


    I get nothing.
    - In both methods target step receives ALL rows. No difference in result between those two methods.

    The author of this tried to tell that if we have two target steps then ALL rows are sent first to first target and when finished same rows are send to second target (copy rows) and if we have "distribute" then first row is send to target 1 then same row to target 2, then we read row 2 and first send it to target 1 then send it to target 2 ......?

    If so, what is the sense of doing like that? Even if there is some decision point (like filtering) the process will take place for all rows does not matter this is "copy" or "round-robin"?

  2. #2
    Join Date
    Aug 2016
    Posts
    290

    Default

    Distribute is used to split rows to multiple targets, usually for performance balancing.

    Copy rows is what I concider default and just copies all rows to all targets.

    Let's say Step A has 3 rows and 2 targets: Step B and Step C.

    Copy:
    -Step A gets Row1, Row2, Row3
    -Step B gets Row1, Row2, Row3

    Distribute:
    -Step A gets Row1
    -Step B gets Row2
    -Step A gets Row3

    If you are confused just try and make some simple examples in spoon!

  3. #3
    Join Date
    Jun 2016
    Posts
    181

    Default

    Thanks Sparkles. Performance would be the only justification for this option (theoretical rather).
    But sentence from documentation "all target steps get equal amounts of rows" is not logical. If we have 3 rows and 2 target steps how to do "equal amounts of rows' for each target?
    Target 1 will receive two rows, target 2 only one row. Not equal.

  4. #4
    Join Date
    Sep 2011
    Posts
    152

    Default

    Quote Originally Posted by Gosforth View Post
    Thanks Sparkles. Performance would be the only justification for this option (theoretical rather).
    But sentence from documentation "all target steps get equal amounts of rows" is not logical. If we have 3 rows and 2 target steps how to do "equal amounts of rows' for each target?
    Target 1 will receive two rows, target 2 only one row. Not equal.
    Thanks for thinking for 3 rows , you could have thought of only 1 row, and distribute equally in round robin fashion .

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by Gosforth View Post
    Thanks Sparkles. Performance would be the only justification for this option (theoretical rather).
    But sentence from documentation "all target steps get equal amounts of rows" is not logical. If we have 3 rows and 2 target steps how to do "equal amounts of rows' for each target?
    Target 1 will receive two rows, target 2 only one row. Not equal.
    Would the word "approximately" have helped you here?
    If you have 501 rows, and two steps, with "Copy", each step receives 501 rows. With Distribute, step A gets 251 rows, and step B gets 250 rows. "Approximately" equal.

    Sometimes (for example in statistics), you want to divide your data into roughly equal groups for comparison. Performance isn't the only (and absolutely isn't just theoretical) reason to want it.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.