US and Worldwide: +1 (866) 660-7555
Results 1 to 7 of 7

Thread: Do transformations process records in order?

  1. #1

    Default Do transformations process records in order?

    If I have an TABLE_INPUT step with an ORDER BY clause on it, will the transformation process the records in the stream in that same order? My change data capture tables send over INSERTS, UPDATES, and DELETES and it's important that those be processed in a certain order. From some of my tests, it doesn't appear to be taking my ORDER BY into account. Am I missing something here?

  2. #2
    Join Date
    Nov 2008
    Posts
    777

    Default

    If you have a single-path flow, the rows should stay in the order specified in the ORDER BY clause. If you have branches or parallel paths due to using Filter Rows or Switch Case Steps, the order will most likely not be maintained.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  3. #3
    Join Date
    May 2014
    Posts
    12

    Default

    So how do you get back the original order again? I have this same problem. Do you have inject an orderBy field somewhere/somehow, and then sort all the data during the merge step? That means all the data has to be sorted in memory... This seems to be a big gap in necessary functionality. Isn't there another way to make sure the original order of row process is maintained at the merge step? I am processing large XML files and the order of row processing is critical to understanding the data. I use a switch case to split the stream based on a value that can be one of two types; one any string, the other a date that has to be reformatted. That's it. Recombining these two streams brings the data together in seemingly random order... Not good. Any advice is appreciated very much. Thanks in advance.

  4. #4
    Join Date
    Nov 2008
    Posts
    248

    Default

    This is not a big gap in functionality. This is a big boost on performance because it comes from the multithreading nature of pdi. You can easily work around this problem (apart from reorganize your trans if possible): for instance using an add sequence step at the beginning of your transformation. It will gives you a field to be sorted upon.
    Andrea Torre
    twitter: @andtorg

    join the community on ##pentaho - a freenode irc channel

  5. #5
    Join Date
    May 2014
    Posts
    12

    Default

    Yes, that's what I did; however, not to bash your great tool, but I still think you should provide that option (or step) so the extra workarounds are not necessary. (JMHumbleO)

  6. #6
    Join Date
    Nov 2008
    Posts
    248

    Default

    Well, there are a number of options:
    1. submit a jira with a feature request. In the end, the project grows thanks to our feedback.
    2. rewrite your solution avoiding splitting the stream: if you need a simple if...then logic, there are few steps that will help (javascript, java filter, udjc, etc.)
    3. write a custom plugin


    BR
    Andrea Torre
    twitter: @andtorg

    join the community on ##pentaho - a freenode irc channel

  7. #7
    Join Date
    Jun 2012
    Posts
    3,186

    Default

    Quote Originally Posted by fgump View Post
    but I still think you should provide that option (or step) so the extra workarounds are not necessary.
    Pardon me to object.
    In compliance with its level of operation, Kettle provides everything to restore order after splitting streams.
    No big deal and certainly no workaround required.
    Just enable row numbering in your input step and use a Sorted Merge step to restore order.
    It can't get easier than that.
    pdi-ce-4.4.0-stable (with patches)
    java 1.7.0_51 (OpenJDK)
    ubuntu 13.10 (x86_64)
    timezone CET / CEST
    sig updated 2014-01-25

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •