Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Merging two streams with *nearly* identical columns

  1. #1

    Question Merging two streams with *nearly* identical columns

    I have two streams with such columns:

    Stream1: a b c d e f g h i j k l m n o p
    Stream2: i p b h l n c k a e m p o f

    You get the idea. Same columns, except Stream2 misses a dozen optional columns, and the order is not the same.

    What is the easiest way to append the rows of both into a single stream?
    I tried "Append streams" but it says "you're mixing rows with different layout".
    I could not find any answer on Google nor this forum.

    A lousy solution could be to modify Stream2 manually so that it looks exactly like Stream1, adding missing columns with "Add constants" and "Select values", but it is so painful to maintain... My dream step would use columns names instead of columns order...

    Thanks a lot!
    Nicolas

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If you want to join the 2 streams, you can use steps like "Merge Join".
    If the key is the order in which the rows appear, add a sequence to both streams.

  3. #3

    Default

    Thanks!
    OK, I just did what you said: I added a sequence to each stream, and used it as a "key" for a Merge Join.
    But when I click on "Show output fields" of the "Merge join" step, I see:

    a
    b
    c
    d
    a_1
    b_1
    c_1

    ... did I do something wrong, or is it normal? It is not what I expected, I wanted all rows to finally just have "a b c d".

    Remark: I actually don't care about the order in which the rows are produced, so adding sequences and using a key seems superfluous.
    Last edited by Nicolas.Raoul; 08-09-2010 at 10:27 PM. Reason: thanks

  4. #4

    Default

    I ended up doing it the very painful way:
    1) Add missing fields with an "Add constants"
    2) Correct the fields order with a "Select values"

    As I am currently evaluating the product, I would be interested in knowing whether there is anything more maintainable.
    Thank you!

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm sorry, I'm not sure what you expected. In over 15 years doing data integration I never came across a situation where I had to do what you just did.
    It seems incredibly strange but let me explain what happens.
    In this specific instance duplicate fields are renamed with an _1, _2 to make sure field names remain unique in the rows.

  6. #6
    Join Date
    Oct 2013
    Posts
    18

    Default

    Hi everybody!

    I came up with a similar use case.
    Just wondering if you guys find a less painful solution?

    Thanks, cheers!

    Azucena

  7. #7
    Join Date
    Apr 2012
    Posts
    253

    Default

    Matt explained the way to do it. It's the same as joining two tables in SQL with the same column names.

    select a.a, b.a from (select 1 a, 0 b from dual) a, (select 2 a, 0 b from dual) b
    where a.b = b.b

    You'll get back two columns: A and A_1

  8. #8
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    What I tend to do is this:
    Use select values step and add constant step to have 2 identical "tables" then append streams step.
    -- Mick --

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.