Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Duplicates after Dimension Update/Lookup

  1. #1
    Join Date
    May 2009
    Posts
    1

    Question Duplicates after Dimension Update/Lookup

    Hi,

    I'm currently loading my DW. I have a stream of daily transactions (multiple transactions per day) in the following form:

    DAY | DEVICENAME | STATE | VALUE

    STATE does not change very often per day, but sometimes does.

    My goal is to extract DEVICENAME and create a DEVICE dimension.

    Therefore, I have a dimension lookup/update (only one instance at the same time) with DEVICENAME as key, DAY as the datestream field, STATE as an attribute. VALUE is not written to the dimension but to the fact table (later in the process).

    Initially, I had an unique index on DEVICE(DEVICENAME, date_begin, date_end, version). However, it failed because of duplicates. I removed it and now I'm facing duplicates such as:

    *) DAY - DEVICENAME - STATE - date_begin - date_end - version
    1) 2009/07/29 - "Device A" - "Active" - 2009/07/29 - 2009/07/30 - 15
    2) 2009/07/29 - "Device A" - "False" - 2009/07/29 - 2009/07/30 - 15

    same version, date_begin, date_end, devicename, day, just the state has changed on that day.

    I also have duplicates with exactly the same values:

    *) DAY - DEVICENAME - STATE - date_begin - date_end - version
    1) 2009/07/29 - "Device A" - "Active" - 2009/07/29 - 2009/07/30 - 15
    2) 2009/07/29 - "Device A" - "Active" - 2009/07/29 - 2009/07/30 - 15

    Commit Size is to 100, cache size to 5000. I'm only using one instance of the step.

    Kettle version 3.1, build 826

    I tried to check if the problem is related to PostgreSQL for some reason, but without success. Any idea what could cause the problem? Is it a problem or would it work?

    Thanks,

    Michael
    Last edited by olschimke; 07-29-2009 at 02:45 AM.

  2. #2
    Join Date
    Feb 2009
    Posts
    4

    Default

    Using DAY as the datestream field would not work in your case. DAY does not change but your data does. date_from / date_to are derived from the datestream field. What happens if you do not set the datestream field and use the default sysdate?

    What are your settings for the fields (insert / update / punch trough)?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.