Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Join/Union two Text input files - Merge join?

  1. #1
    Join Date
    Sep 2013
    Posts
    28

    Default Join/Union two Text input files - Merge join?

    Hello everyone

    Problem 1: I have 2 text file inputs with product codes (Artikelnummer) and their product category (maincat_aggr). Now I want to union them resp. need all data sets of both input1 and input2. If the result should contain multiple sets of same product codes (in this case most likely not to appear), their product categories should be joined in new field L0+L1 and separated by ; . Doubles to be avoided/eliminated.
    I tried it with Merge join FULL OUTER but my result is not really satisfying, the way I did it. The step Stream lookup doesn't get me to the goal either. I attached the .ktr with the two text files.

    Problem 2: is there a single step after Merge join (or whatever solution step you have) that joins the appropriate fields instead of these several concat-steps?

    Problem 3: my Text file output puts every single field in a next row. Why won't a dataset stay in one row? PLUS the datasets are full of spaces. What did I do wrong? Sorry, I am a novice. Most likely my problem 2 and 3 correspond with my first problem..

    Many thanks for your help!
    Regards, zioso
    Attached Files Attached Files
    pdi-ce-4.4.0-stable
    timezone UTC/GMT -4 Stunde

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I just made some slight modifications to your transformation.

    Analyze what I did and you can answer your own questions.

    BTW: You might want to get rid of those VT control characters in your text - try Replace In String for that.
    Attached Files Attached Files
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Sep 2013
    Posts
    28

    Default

    Hi marabu
    Many thanks again! I think I understand all of your transformation.
    But listen, I didn't explain exactly what my previous transformation steps look like in order, that was my hope, to make it easier to explain and understand. Well, now I think I received the bill because I think I cannot proceed this way.
    Well, I don't actually have 2 text fiel inputs but two steps (group by) that i want to join. So, if I don't "want" to: first make the 2 output files and then second to use step Test file input, how would I have to deal with it? I guess still not the Merge join, right?
    BTW: did you find more than 1 kind of these VT control characters? I thought I got rid of them except for one.
    Regards, zioso
    pdi-ce-4.4.0-stable
    timezone UTC/GMT -4 Stunde

  4. #4
    Join Date
    Sep 2013
    Posts
    28

    Default

    I believe I found the way. Step Sorted merge. I will post if it worked or not. EDIT: it did work.
    Last edited by zioso; 10-28-2013 at 12:39 PM.
    pdi-ce-4.4.0-stable
    timezone UTC/GMT -4 Stunde

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Sort Rows will accept multiple input hops with identical rowsets gladly.
    Otherwise, a simple Dummy step can be used to gather rows from multiple sources.
    Append Streams does it explicitly, so rows don't end up mixed.
    Sorted merge requires every input rowset to be sorted.
    So long, and thanks for all the fish.

  6. #6
    Join Date
    Sep 2013
    Posts
    28

    Default

    I checked the steps. Thank you very much!
    A question to the sorting; is it necessary to implement a sort row step in front of these step that require the sorting OR is it also possible with the following step sequence; ... - Sort rows - Group by - Filter row - Select value - Group by 2
    I don't think that the steps before Group by 2 mixe the rows up. So it doesn't need to be right in front of the Group by 2 step. Correct? Thank you!
    zioso
    pdi-ce-4.4.0-stable
    timezone UTC/GMT -4 Stunde

  7. #7
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    If there are no other input hops, Group By 2 will find the previously established sort order intact.
    So long, and thanks for all the fish.

  8. #8
    Join Date
    Sep 2013
    Posts
    28

    Default

    Ok. So input hops will mix it up again. Thank you.
    pdi-ce-4.4.0-stable
    timezone UTC/GMT -4 Stunde

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.