Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Reusing Transformation To Sort input streams automaticly

  1. #1
    Join Date
    Mar 2010
    Posts
    3

    Lightbulb Reusing Transformation To Sort input streams automaticly

    Hello,

    I am new in this forum, but been using pdi for years.
    I do a lot of merge joins in every transformation, and because input stream must be sorted on the keys, I always need to use 3 steps to do the join. One to sort first input strem, another to sort second stream and the join step.

    I would like to set up a transformation to do this and re-use it, but because the number of fields to join in not always the same I am not able to find the way.
    I want to end up with a step, that not only join the stream but sort the input streams first. That way it'll be easier and faster and less error prone for me.

    Anyone could guide me??

    Thanks very much and sorry for my english

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    The only way to do this (that I can think of at this moment) is to rename your input fields in a generic way, for example:
    match1, match2, match3 and having those 3 steps to do sort+join using generic names.
    You can then rename your fields to original names at the end of the process.
    -- Mick --

  3. #3
    Join Date
    Mar 2010
    Posts
    3

    Default

    Hi,

    Thanks for your answers but i am not sure about this. In your way you gotta have 3 columns right? What if sometimes I want to join by 4 fields o by 5 fields? This is normal situation for me.
    This is my problem, the number of field to join by is not always the same. Do you think anyone has coded a plugin to sort input fields automaticly?? I don´t know why kettle does not do this by default, it should by forced because if you dont sort input streams the result is not gonna be right!!

    Thanks for your help

  4. #4
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Quote Originally Posted by nachoelg View Post
    I don´t know why kettle does not do this by default, it should by forced because if you dont sort input streams the result is not gonna be right!!
    But what if I have data coming from two different databases (Table Input steps) that I know are sorted correctly (because I used a "order by" clause) ... Why should I pay a performance penalty to resort them when I go to join them in PDI?

  5. #5
    Join Date
    Mar 2010
    Posts
    3

    Default

    Yeah you are right!! Thats why it is not sorted automaticly!!
    But how difficult would be to have a custom step or plugin to do this?? Since there is no way to do it with mapping and reusing transformation there is got to be another way!! If I had Eclipse and I knew java I would code it but thats not the case unfortunately hehehe

    thanksss

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.