Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Attempting to compare two files does not seem possible

  1. #1
    Join Date
    Oct 2009
    Posts
    19

    Default Attempting to compare two files does not seem possible

    We are running into some issues using Stream Lookup to compare two files that are not able to solve. The scenario is that we wish to find all records in File#1 that are not in File#2, then locate all records in File#2 that are not in File#1. We want all of these delta rows to be output as two separate files. In attempting to use Stream Lookup, we are able to file the first set of delta rows, In File#1 but not in File#2. However, it seems that attempting to use the same stream data, we cannot execute a second Stream Lookup because the the changes in one of the Stream Lookup transformations automatically changes the details in the second Stream Lookup.

    Both of our input files are currently flat files of .CSV format, but later we will modify the process to use database calls to Oracle.

    We would appreciate learning any limitations regarding working with Stream data. In addition, if our approach is incorrect and there is another way to do this, we would appreciate know that as well.

    Current environment is Kettle - Spoon GA 4.1.2 build 2011-01-26 13.31.23

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    That's what the "Merge Rows (diff)" step was made for. Tried it yet?

  3. #3
    Join Date
    Oct 2009
    Posts
    19

    Default

    Thank You Matt. That did the trick! Works Perfect!

  4. #4
    Join Date
    Oct 2009
    Posts
    19

    Default

    Quote Originally Posted by catmkt2009 View Post
    We are running into some issues using Stream Lookup to compare two files that are not able to solve. The scenario is that we wish to find all records in File#1 that are not in File#2, then locate all records in File#2 that are not in File#1. We want all of these delta rows to be output as two separate files. In attempting to use Stream Lookup, we are able to file the first set of delta rows, In File#1 but not in File#2. However, it seems that attempting to use the same stream data, we cannot execute a second Stream Lookup because the the changes in one of the Stream Lookup transformations automatically changes the details in the second Stream Lookup.

    Both of our input files are currently flat files of .CSV format, but later we will modify the process to use database calls to Oracle.

    We would appreciate learning any limitations regarding working with Stream data. In addition, if our approach is incorrect and there is another way to do this, we would appreciate know that as well.

    Current environment is Kettle - Spoon GA 4.1.2 build 2011-01-26 13.31.23
    Adding a new twist to this requirement, we have been asked to Intersect a series of Lists (files) to identify the common records amongst each. We thought that we could use the Merge Rows (diff) step to compare the Lists one against the other, filtering the rows tagged as "identical" and then using the resulting stream to compare against the next List, and so on.

    However, we seem to have a problem with that approach because the metadata associated with the transformations seems to bleed from one transform step to the next. For example, we are trying to use the FILTER step to filter records containing the "identical" field flags, but when adding more than one FILTER step to the transformation, the steps seemed to be linked by metadata, so that when modifying the second FILTER step in the transformation, it modifies the first FILTER step defined.

    We would appreciate any feedback on 1) the approach, 2) details on why we see the metadata bleeding across steps.

    Thanks in advance.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.