Hitachi Vantara Pentaho Community Forums
Results 1 to 14 of 14

Thread: 3.0.0 GA: "Merge Join" output very strange and wrong

  1. #1
    Join Date
    Oct 2006
    Posts
    4

    Unhappy 3.0.0 GA: "Merge Join" output very strange and wrong

    I have the following situation:

    I use a "Merge join" step to do an "outer join" for two text files. Three Fields of each text file data stream are used as key fields.

    When i preview the data for the "Merge join" step the join result is perfectly ok.

    But when I preview the data for the step following the "Merge join", the content of all the joined rows is copied from the last row of the join. This even occurs when I use a "Dummy" as the following step. Ans the same wrong output is propagated to the next steps and to the final output.

    And there is a very strange workaround I found:

    When I connect a second dummy to the "Merge Join" step and set "Data movement" to "copy" the preview of the data for the second dummy shows the correct rows.

    Please help, this seems to be a strange and erroneous behaviour of the "Merge Join" (that was announced to work correctly in Kettle 3.00 GA)

    Best greetings,

    Ulf Petersen
    Last edited by upet; 11-26-2007 at 01:16 PM.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Example?

    Regards,
    Sven

  3. #3
    Join Date
    Oct 2006
    Posts
    4

    Default Test script and data for Merge Join problem

    Hi Sven,

    attached is a simple Test Script for Spoon to reproduce the problem.

    In the Text Input Elements the paths to the input-files are hard coded to
    C:\kettle test\

    Just to mention that I played around with many variations of the data movement mode (copy/distributed). That didn´t change anything.

    Thanks for your reaction to my post, best regards,

    Ulf
    Attached Files Attached Files

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Dear Ulf,

    I couldn't find anything wrong with it actually. Previewing the "Join 2" as well as the 2 dummies give exactly the same output. (4 rows, A, B, C, D)
    I did tweak some things in the preview for the 3.0.1. release, but I doubt that this could be a problem.

    I'll try to drop a new 3.0.1. image to try for you tomorrow, but for the time being I'm not seeing anything wrong.

    All the best,

    Matt

  5. #5
    Join Date
    Oct 2006
    Posts
    4

    Default

    Dear Matt,

    thanks for your quick reply! Just to avoid misunderstandings I attached a screenshot of the preview I get for the step "Preview gives wrong output". I use Kettle Version 3.0.0 GA as available from sourceforge.

    Best greetings,

    Ulf
    Attached Images Attached Images  

  6. #6
    Join Date
    Nov 2007
    Posts
    10

    Default

    Hello all,

    I am having a similar problem that Ulf has encountered. I have two columns (400 items, unique) from two spreadsheets where I like to use outer join on. In the result, the 2nd dataset contains duplicate values when there is no matches from the 1st dataset (step).

    I have sorted both datasets ascendingly before the merge join, it seems like only right and outer join has this problem.

    - Ken

  7. #7
    Join Date
    Nov 2007
    Posts
    10

    Default

    Quote Originally Posted by hatkeeper View Post
    Hello all,

    I am having a similar problem that Ulf has encountered. I have two columns (400 items, unique) from two spreadsheets where I like to use outer join on. In the result, the 2nd dataset contains duplicate values when there is no matches from the 1st dataset (step).

    I have sorted both datasets ascendingly before the merge join, it seems like only right and outer join has this problem.

    - Ken
    Sorry, just to add that in the preview mode, everything seems to be fine. The problem appears in the excel output.

    - Ken

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    All I can say is that in 3.0.0GA I'm not seeing the problem that Ulf described with the test-case he posted.

    Matt

  9. #9
    Join Date
    Nov 2007
    Posts
    10

    Default

    Hi Matt,

    I was looking at the file that Ulf submitted. Shouldn't the preview gives the same result for Join2, dummy1, and dummy2?

    I see a difference between dummy1 compare to (Join2=dummy2)

    Anyways, I like to submit the example as I mentioned before. I have two dataset that contains unique items and I need to combine them to create one master list.

    Preview on the merge join gives me the correct result, but once I preview on the excel output or viewing the actual file, the result is different.

    Please help. Thanks.

    - Ken
    Attached Files Attached Files

  10. #10
    Join Date
    Nov 2007
    Posts
    10

    Default

    Here is a screen shot of the previews of merge join and excel output.

    - Ken
    Attached Images Attached Images  

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Thanks Ken,

    I'll look into it.

    Matt

  12. #12
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    http://jira.pentaho.org/browse/PDI-534

    I'm sure I fixed the issue and I'm uploading an update to 3.0.0 over here:
    (files to be placed in the lib/ directory

    http://kettle3.s3.amazonaws.com/kettle-ui-swt-3.0.jar
    http://kettle3.s3.amazonaws.com/kettle-engine-3.0.jar

    All the best,

    Matt

  13. #13
    Join Date
    Nov 2007
    Posts
    10

    Default

    Thanks, Matt. It looks good.

    - Ken

  14. #14
    Join Date
    May 2010
    Posts
    19

    Default Same problem with SQL Tables

    I came across the problem discussed in this thread today while dealing with SQL Server tables.

    I am using the 3.2.0 Stable version. I am merging two tables after having sorted them individaully in two previous steps. When I preview, I get the correct result set (10 rows, as expected), but when the same is sent as output to the next step, the data is shown of the last row in 10 rows. Moreover, as suggested by the original post, when I created two dummy steps and linked the second dummy step to the next step, I get the correct output.

    I have attached the correct and wrong outputs. To distinguish, the last three columns in both previews - garmentSize, pkStyleSizeId and sequenceNumber are having different values in the correct output for all 10 rows, but have same value in the wrong one.

    Is there any solution to this problem? My thanks in advance.

    Regards,
    Manish
    Attached Files Attached Files
    Last edited by etlUser; 05-13-2010 at 11:48 AM. Reason: Added attachments

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.