Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: Dimension lookup/update not updating records.

  1. #1
    Join Date
    May 2007
    Posts
    15

    Default Dimension lookup/update not updating records.

    Hi all,

    I am using Kettle's Dimension Lookup/Update step of Kettle 3.0 GA version.
    I basically want to implement SCD Type 2 dimension i.e for every created, updated and deleted records, I will add a new record in my dimension table
    and I will close the last record by setting the Date_To to current date and the newly added record will have Date_to set to some high date.

    I have a table which contain fields as personid,name,eventtype.

    PersonId is the unique key of the table .
    Name is the name of the person.
    Eventtype is the type of event such as created,updated or deleted.

    Following is my entries in the "Dummy" OLTP table

    PersonId: Name: EventType; EventTimeStamp
    123;John;created; 01-Dec-2007
    123;Jo;updated; 08-Dec-2007
    123;Joh;updated; 13-Dec-2007

    I am expecting following entries in my Dummy_Dim table in the final transformation run.

    PersonId; Name; EventType; Version;DateFrom; DateTo
    123; John; created; 0; 01-Dec-2007; 13-Dec-2007
    123; Jo; updated; 1; 13-Dec-2007;13-Dec-2007
    123; Joh; updated; 2; 13-Dec-2007;31-Dec-2999 {This will be the current active record whose DateTo is set to some high date}

    When I try to load the dimension the following thing happens:
    1)For the created record, Kettle’s Dimension Lookup/Update step sets the DateFrom to the current date and DateTo to some high date value, which is correct.
    2)For the second updated record, kettle sets the DateTo date of the created record to the current date, inserts the updated record with DateFrom set to current date and DateTo set to some high date, which is also correct.
    3)But now when a third updated record of the same person comes, then Kettle inserts a new record with the updated value, but this time it does not set the DateTo date of the previously updated record to the current date which it should.

    We see the following entries in the table:

    PersonId; Name; EventType; Version;DateFrom; DateTo
    123; John; created; 0; 01-Dec-2007; 13-Dec-2007
    123; Jo; updated; 1; 13-Dec-2007; 31-Dec-2999
    123; Joh; updated; 2; 13-Dec-2007;31-Dec-2999

    The source table dummy whose structure is :

    CREATE TABLE "DUMMY"

    (
    "PERSONID" VARCHAR2(30 BYTE),

    "NAME" VARCHAR2(30 BYTE),

    "EVENTTYPE" VARCHAR2(30 BYTE)

    )

    The target dimension table is:

    CREATE TABLE "DUMMY_DIM"

    ( "DUMMYKEY" NUMBER(*,0),

    "VERSION" NUMBER(*,0),

    "DATE_FROM" DATE,

    "DATE_TO" DATE,

    "PERSONID" VARCHAR2(30 BYTE),

    "NAME" VARCHAR2(30 BYTE),

    "EVENTTYPE" VARCHAR2(30 BYTE)

    )

    Please can some one help us with this logic. Why Kettle's Dimension lookup is not solving our problem. Is this some limitation with kettle?
    Also what is the purpose of Stream Data Field in the Dimesion Lookup transformation.
    Is it because of that we are getting the wrong result. Should we set this field to EventTimeStamp?
    Attached is the dummy transformation that we are trying.

    We have been working on this since last few days. Any help would be appreciated.

    Thanks in advance.
    -Honey
    Attached Files Attached Files

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    All your update fields are flagged as Insert (Kimbal Type II dimension update).
    As such we will never ever do an update in the dimension table.

    Matt

  3. #3
    Join Date
    May 2007
    Posts
    15

    Default

    Hi Matt,

    Thanks for your reply!
    Thats true I am implementing the Type II dimension update where I want to insert records whenever a created, updated or deleted entries come in.
    I basically just want to update the DATE_TO field of the previous record to current date and the new updated records DATE_TO to some high date.

    I read this link in the forum about error in stream date field logic http://forums.pentaho.org/showthread...ookup%2FUpdate
    As per mentioned in the post, I downloaded the latest jar files too.
    Still it is not working as per my logic.

    I tried the sample provided in kettle DimensionLookup - update dimension table 2.ktr. Attached is the updated transformation that I tried at my end.
    Even this transformation is not giving me the expected result.
    The output of this transformation is as follows:

    ID Name FirstName Date_From Date_TO Customer_tk Version
    101 Boden Sven 01-JAN-00 00:00:00 31-DEC-99 23:59:59 1 1
    103 Hassan Samatar 01-JAN-00 00:00:00 31-DEC-99 23:59:59 2 1
    100 Casters2 Matt22 01-JAN-00 00:00:00 13-DEC-07 00:00:00 3 1
    100 Casters Matt2 13-DEC-07 00:00:00 31-DEC-99 23:59:59 4 2
    100 Casters222 Matt222 13-DEC-07 00:00:00 31-DEC-99 23:59:59 5 2
    102 Bleuel Jens2 01-JAN-00 00:00:00 31-DEC-99 23:59:59 6 1


    Check that the Date_To field of Matt2 should be set to 13th dec as this record is closed and a new updated entry has come for 100 record (Matt222).
    Also the version field of Matt222 should be updated to 3.
    According to my logic the active record for customerid 100 is the one where Date_To is set to a high date. In the above scenario I am not able to get the active record.

    Following is the output that I am expecting:

    ID Name FirstName Date_From Date_TO Customer_tk Version
    101 Boden Sven 01-JAN-00 00:00:00 31-DEC-99 23:59:59 1 1
    103 Hassan Samatar 01-JAN-00 00:00:00 31-DEC-99 23:59:59 2 1
    100 Casters2 Matt22 01-JAN-00 00:00:00 13-DEC-07 00:00:00 3 1
    100 Casters Matt2 13-DEC-07 00:00:00 13-DEC-99 23:59:59 4 2
    100 Casters222 Matt222 13-DEC-07 00:00:00 31-DEC-99 23:59:59 5 3
    102 Bleuel Jens2 01-JAN-00 00:00:00 31-DEC-99 23:59:59 6 1

    So for customer 100: current active record is Casters222 Matt222
    customer 101: active record is Boden Sven
    customer 102: active record is Bleuel Jens2
    customer 103: active record is Hassan Samatar

    Is it possible to implement this logic using the Dimension lookup/update? or else we will have to implement our own logic?

    Please guide me in the right direction.
    Thanks in advance!
    Attached Files Attached Files

  4. #4
    Join Date
    May 2007
    Posts
    15

    Default

    I am eagelry awaiting for some reply to this post.
    Is this a known issue with Kettles' Dimension Lookup update transformation?
    Is it that Kettle is internally working with cache and because of which it is not able to update the old records properly?
    Is there some way where we can refresh the cache everytime? I know this would be a performance hit.

    We are very keen on using kettle to do this for us.
    If kettle does not provide this implementation then we have only one way of writing our own stored procedure.

    Looking forward for some help with this problem.

    Regards,
    -Honey

  5. #5
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    You should not have to do anything. On your example, I would expect not to find row with tk=5. I would also not expect the same output as you - TK 4 and 5 have overlapping date ranges.

    That said, no idea why you are getting that. My dimensions are being correctly updated, no extra work done
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  6. #6

    Default

    Also update/lookup will never actually mark deleted records as closed, you would have to do that manually (but it would be nice to have an option in the component to do that)

  7. #7
    Join Date
    May 2007
    Posts
    15

    Default

    Thats true len0x, the step does not handle deleted records. We had implemented to handle that manually once the kettle's dimension update lookup step was over.
    It will be really helpful if the component handles this type.

    I am not very clear about what pmalves is trying to suggest. Please help me in understanding.

    My concern is that in my single ETL run I may have a created, 3-4 updated and a deleted entry for the same person. How do the Kettle's dimension lookup update step handle this?

    For the sample transformation that I have provided what should be the expected output from kettle dimension lookup update step? or do you suggest that the sample provided itself is not correct?
    But we have such a scenario in our OLTP.

    Thanks,
    -Honey

  8. #8

    Default

    I dunno how their cache work, but just for testing you could set commit size as 1 and well as cache size of 1 and see if that solves your problem in which case its indeed a cache issue...

  9. #9
    Join Date
    May 2007
    Posts
    15

    Default

    I tried setting the commit size as 1 and well as cache size as 1 for the Dimension lookup/update step. But still I am getting the same output

  10. #10
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    What I said was: I was not expecting your output, something looks wrong.

    "Also what is the purpose of Stream Data Field in the Dimesion Lookup transformation" -> if you are doing a lookup, you will get the field as it was at that time; on the insert it's the reverse.
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.