Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Merge Not working as expected

  1. #1
    Join Date
    Dec 2007
    Posts
    8

    Default Merge Not working as expected

    Merge not working as expected.

    I am updating a client table with new and changed data,I sort data from my client table in MYSQL and the input txt file on Client key and Update date.

    The Client table is my reference and the text file is my compare.

    When I compare on Client key and Update date I end up with two records one marked deleted (old client record)and another marked new (txt file record). I would have expected the txt file to be marked changed.

    If I just compare on Client key I only get one record but it is marked as Identical and is the old record and the new one has disappeared.

    Should I change them round as reference and compare? The manual indicates that existing/old data should be my reference?

    Thank you

    Ian

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    There's a difference between what you expect and what it does, I assume

    One stream needs to be the original situation (not only the changes, the full set), the other streams needs to be the current data (also the full set). And both have to sorted on key of course.

    Is there something maybe wrong with your input?

    Regards,
    Sven

  3. #3
    Join Date
    Dec 2007
    Posts
    8

    Default

    My input file is just a list of changed records, not the full set. I do not want to have to export the entire Client table every night.

    I was trying to update the client table in the datawarehouse with new and changed records. I do not want to have a history of client changes.

    From what you are saying I would appear to have no choice following a merge, and will have to correct the merged data at a later date. (ie remove the historical records). Do I do this in ETL or do I build a Dimension?

    However, that does not explain why a changed record (in the update stream) is then marked as New and the old one deleted. I would expect the record in the input stream to be flagged as changed.

    Ian

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    LOL ... the algorithm assumes that you have 2 full sets... it works like a kind of diff on UNIX. So if you have only the changes in one you will get wrong results.

    Regards,
    Sven

  5. #5
    Join Date
    Dec 2007
    Posts
    8

    Default

    Is there anything I can do/use to achieve what I am trying to do.

    As I said I am trying to avoid extracting whole dataset each night, and then update my datawarehouse table with latest record set.

    If not then I might as wll truncate whole table and update with full dataset which does not sound very efficient.

    Ian

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You can handle a diff yourself IF you have full knowledge of which records got updates, inserted and deleted in the source table.
    At the risk of re-stating the bleeding obvious, if you don't have that information you can only extract it using the full data set snapshots (Merge Rows).

  7. #7
    DEinspanjer Guest

    Default

    Only records that compared as identical given the keys and comparable values pick up the reference side values. Changed and new records contain comparison side values as you'd expect.

    I logged a bug against this behavior recently because it didn't seem right to me.

    That said, the ETL I'm doing sounds similar to what you are trying to do. I have a table that contains a list of all the files I've ever processed which is my reference set and a Get file names step that looks in a directory for all the files that currently reside in it.

    I want to process new and changed files. I want to ignore identical and deleted files. I just have a filter step with that logic. Wouldn't that work for you?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.