Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: problem with Merge Join

  1. #1

    Question problem with Merge Join

    Hi all,

    I've been reading all the morning posts related to merge join. I'm trying to join two csv files (full outer) and apply some logic to the resulted stream (filter based on some logic). After the filter I export the correct results to one file and the erroneus ones to another file. The thing is that in the erroneus file I have entries which correspond only to one of the original files. What I mean is that in the erroneus file i should have around 55 fields, but in some cases I have lines with only 30 or something fields (whitout additional "," for missing values). This is quite weird since it means that the merge didn't work correctly. I have tried using blocking steps but with no success. The second file is already sorted. This transformation has worked correctly on small files, but for these ones it seems something is not right (one file has around 1800 entries and the other one around 30000 entries). I have attached the transformation. If anyone has any idea what I am doing wrong, please help.

    Regards,
    Gigi
    Attached Files Attached Files
    Last edited by gigi.pruteanu; 11-19-2008 at 07:53 AM.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The transformation looks OK. Verify that the file is indeed sorted on client_id (ascending).

  3. #3

    Default

    Thanks for the quick answer.

    I double checked that the second file is sorted and it was sorted. The problem was with the data from the two files. Some entries contained \" and this caused the lines to be read incorrectly. If it wasn't for your answer i've never would've looked at th einput files. There goes almost a day of going through the same transformation over and ove again . But at least I got to try different approaches, learnt some new things. One again, thx.

    Regards,
    Gigi

  4. #4
    Join Date
    May 2007
    Posts
    11

    Exclamation Why do special characters throw off Merge Join?

    I have this exact problem, but instead of a \" throwing off the merge join, it is an accented character. If one of the key values for the merge join has an accented character, the fields are not joined in for that row, however if there are no accented characters the merge join works fine and the fields are joined in.

    Any ideas why accented characters would throw off merge join?

    Yes the data is sorted, we are using 3.1.

    Any ideas??? Thanks for the help!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.