Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Race Condition on rows

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Feb 2013

    Default Race Condition on rows

    maybe it's absolutely easy, but I could not find a solution so far.

    My transformation is reading up to 5000 rows from a table in a staging area.
    Then it does some transformation steps and finally it's populating the data to some tables in the warehouse schema.

    It is absolutely crucial that the rows never change their order.
    The select statement is following a chronological index.
    I used order by and force index.

    So after the Table Input step the row set is in the right order (I hope so...).
    At least it is if I execute the select manually...

    But I can see, that in the Insert/Update step the race has not been decided

    Is it really necessary to sort the sorted result again in a temp file in a Sort step?

    I tried to get rid of duplicate rows, but that doesn't work either. It just drops the 'wrong' rows. Means, it keeps the oldest and skips the newest.

    So what is the right way to run a transaction, so that the output happens in exactly the same sequence as the input?
    Or, as an alternative, that I can sort out the oldest rows?

    Thanks for any help

  2. #2
    Join Date
    Sep 2013


    I am not sure on what do you mean by race condition.
    As far as I can understand, if you are using table input with something like '... ORDER BY' to have presorted input, you can assume it is still presorted only if you are using one-line transformation. First meet branch will broke sort consistency - for example you have one step with 2+ outputs.

  3. #3
    Join Date
    Jul 2009


    As Dzmitry wrote, if your transformation has branches in it, then your original row order will not be preserved. If you have steps that you run with multiple copies, by selecting "Change number of copies to start...", then your row order will not be preserved. Sorting steps will also change the order.

    If you are really just interested in the last values of rows based on a key, then you might try using a "Sort rows" step where you specify the key fields plus the timestamp field, followed by a "Group by" step where the Group fields are the key fields and the Aggregates all use the type Last Value.

    If you would rather use the Insert/Update step the way you have it now, then be sure to specify "Y" in the Update column of the Update fields, so PDI will actually update those values.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.