Have a very basic question - dont think I have seen it explained anywhere. Apologize ahead of time if its been mentioned.
How are the rows processed in PDI?
Background on the question...
I ran a transformation which reads products from the CSV file and upsert categories it belongs to. I have a sub transformation for inserting category if it doesnt exist. I ran the wrapper transformation with 10 rows where all the products belonged to the same category. What I should have seen in the log was corresponding to 1 insert and 9 passes but what I see instead is all 10 inserts and there is only 1 category inserted in the db which is good and is what I expect.
hence the question "how are the rows processed?" The only logical way to explain what I see is (that too I am not too sure is correct) that all the rows are processed parallely in different threads and all of them think they are inserting, but in the end due to merge of transaction or whatever is causing only 1 insert in the db. Is that correct? or may be there is an error somewhere in the transformation which causes my variables/fields to not get updated between row processing?
And if the rows are indeed processed in parallel then steps that involve sorting are really "blocking steps" since they have to wait for all the rows to come in, sort and then pass it on to the next step. Is this a correct assumption?
On the contrary I know I have read that the steps are definitely exectured in parallel which is also a bit of a mystry given the serial "nature" of the transformation and job flowchart/diagram.
Overall I am still not very clear on relation between field order/position, row order and PDI transformation/job execution flow (steps like select values/merge/sort/ split to rows make things bit more complicated to understand).
Will appreciate some insight.