1. Senior Member
Join Date
Jun 2016
Posts
181

## Data movement - please explain.

From the tutorial:

• Distribute rows of data in a round-robin fashion: each target step (copy) gets a row in turn, all target steps get equal amounts of rows
• Copy rows: all target step (copies) receive all rows.

I get nothing.
- In both methods target step receives ALL rows. No difference in result between those two methods.

The author of this tried to tell that if we have two target steps then ALL rows are sent first to first target and when finished same rows are send to second target (copy rows) and if we have "distribute" then first row is send to target 1 then same row to target 2, then we read row 2 and first send it to target 1 then send it to target 2 ......?

If so, what is the sense of doing like that? Even if there is some decision point (like filtering) the process will take place for all rows does not matter this is "copy" or "round-robin"?

2. Senior Member
Join Date
Aug 2016
Posts
290
Distribute is used to split rows to multiple targets, usually for performance balancing.

Copy rows is what I concider default and just copies all rows to all targets.

Let's say Step A has 3 rows and 2 targets: Step B and Step C.

Copy:
-Step A gets Row1, Row2, Row3
-Step B gets Row1, Row2, Row3

Distribute:
-Step A gets Row1
-Step B gets Row2
-Step A gets Row3

If you are confused just try and make some simple examples in spoon!

3. Senior Member
Join Date
Jun 2016
Posts
181
Thanks Sparkles. Performance would be the only justification for this option (theoretical rather).
But sentence from documentation "all target steps get equal amounts of rows" is not logical. If we have 3 rows and 2 target steps how to do "equal amounts of rows' for each target?
Target 1 will receive two rows, target 2 only one row. Not equal.

4. Senior Member
Join Date
Sep 2011
Posts
152
Originally Posted by Gosforth
Thanks Sparkles. Performance would be the only justification for this option (theoretical rather).
But sentence from documentation "all target steps get equal amounts of rows" is not logical. If we have 3 rows and 2 target steps how to do "equal amounts of rows' for each target?
Target 1 will receive two rows, target 2 only one row. Not equal.
Thanks for thinking for 3 rows , you could have thought of only 1 row, and distribute equally in round robin fashion .

5. Senior Member
Join Date
Apr 2008
Posts
4,696
Originally Posted by Gosforth
Thanks Sparkles. Performance would be the only justification for this option (theoretical rather).
But sentence from documentation "all target steps get equal amounts of rows" is not logical. If we have 3 rows and 2 target steps how to do "equal amounts of rows' for each target?
Target 1 will receive two rows, target 2 only one row. Not equal.
Would the word "approximately" have helped you here?
If you have 501 rows, and two steps, with "Copy", each step receives 501 rows. With Distribute, step A gets 251 rows, and step B gets 250 rows. "Approximately" equal.

Sometimes (for example in statistics), you want to divide your data into roughly equal groups for comparison. Performance isn't the only (and absolutely isn't just theoretical) reason to want it.