Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Problems with Dimension lookup/update and own date

  1. #1

    Default Problems with Dimension lookup/update and own date

    Hi all,

    I'm building a dimension with data of Car types for a Car Rental company. The source is a combination of the current prices per car type, appended with the history table, plus a datefield. This datefield contains the date on which the price listed in that specific row became current.

    I thought: very good! A datefield which I can use for my (technical) start and end date in the dimension. Thus, I used this date for the Stream Datefield in the Dimension lookup/update transformation.

    But my result wasn't what I expected. A screenshot of (a subset) of the result:



    2004-01-01 is the functional date of Version 1, meaning that version 1 should start on 2004-01-01 and should end on 2005-01-01, the startdate of version 2. Version 2 should start on 2005-01-01... etc.

    Obviously, I'm doing something wrong. Does somebody see what the error is?

    Thanks in advance,

    Rick
    Last edited by rickonline; 07-20-2009 at 04:40 AM.

  2. #2

    Default

    Is there no one with an idea how to solve this?

    If my description of the case isn't clear, please tell me, so I can elaborate some more.

    Thanks.

    Rick

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Hi Rick,

    Personally I find this approach to be incorrect since IMHO the dimension date range pertains to the validity of the dimension record, not of the car type itself.

    But since this is indeed just my opinion, we made the behavior of the start date vector configurable in version 3.2.0.

    http://wiki.pentaho.com/display/EAI/...+Lookup-Update

    Matt

  4. #4

    Default

    Thanks Matt.

    First of all, I agree with you that most of the time the dates should indicate the (technical) validity of the record. However, in this case the data was already available in a sort of history type wise, including validity dates.

    Because the source is a historical overview the Business Key (TypeID) isn't unique. The combination TypeID and (functional) Startdate should be unique, but the source can't guarantee that.

    But your comment did set me to think. How was I taught to solve this and how did I solve this in the past?

    My solution: I'm going to make a dimension with technical and functional dates. Technical dates will be handled by Pentaho, the functional ones will simply be filled like 'normal' values. The business key will be TypeID, together with a generated sequence. Disadvantage: when someone 'inserts' a period between two periods (which will most likely never happen) almost all records will have to be changed.

    Gonna see how this works out!

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm going to make a dimension with technical and functional dates.
    In the end, that's my preference as well. The same situation happens with deletion of records. Some people set the end-date of the date range. Personally I think that's a similar mistake. Including a deleted flag (Y/N) and a deletion date seems like a more sustainable/resilient approach. I've seen too many cases where records were magically un-deleted even when this was so-called "impossible". The thing is, when enough pressure is put on IT to restore a deleted record, anything is possible

  6. #6
    Join Date
    Feb 2009
    Posts
    4

    Default

    But what if you want the dimension valid_from / valid_to to be in sync with change dates of the source system?


    In my opinion using Stream Datefield was implemented for this case and it worked up until version 3.1. In version 3.2 it does not work any more (see http://jira.pentaho.com/browse/PDI-2492), because Stream Datefield is taken from the first record only (assignement of data.valueDateNow in "if (first)"-block - line no. 188 of DimensionLookup.java). Version 3.1 took Stream Datefield from every record (line no. 223 of DimensionLookup.java).


    Example (times left out for better readability):


    CHANGED

    03.09.2007

    01.12.2007

    01.12.2008

    Dimension (3.1)

    valid_from, valid_to

    01.01.1900, 01.12.2007

    01.12.2007, 01.12.2008

    01.12.2008, 31.12.2199

    Dimension (3.2)

    valid_from, valid_to

    01.01.1900, 03.09.2007

    03.09.2007, 03.09.2007

    03.09.2007, 31.12.2199

    Obviously these date ranges are wrong and assignement of fact data to dimension goes wrong.

    Why was it changed from version 3.1 to 3.2 and undocumented?

    Maybe you should change


    valueDateFrom = data.valueDateNow (Line no. 575 in DimensionLookup.java)


    to


    valueDateFrom = valueDate


    when inserting a new version into the dimension.

  7. #7
    Join Date
    Nov 2008
    Posts
    9

    Default

    That's how I understand it as well. For an initial load, it is often necessary to use the source system change dates as a proxy for the technical validity of the dimension values, since nothing else is available. I'm currently working round this by using the "copy rows to result" step, then retrieving the rows one by one using "Execute for each input row" into a second transformation which updates the SCD. Works as designed, but an absolute performance killer.

  8. #8
    Join Date
    Feb 2009
    Posts
    296

    Default

    uhh... this is bad. I had not noticed it but I would have in a couple of days during QA
    Fabian,
    doing ETL with his hands bound on his back

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Bad enough so I fixed it in 3.2.1 and trunk.

  10. #10
    Join Date
    Feb 2009
    Posts
    296

    Thumbs up

    Thumbs up Matt!

    Is there going to be a release for 3.2.1 or will the next RCs and releases be 4.X?
    Fabian,
    doing ETL with his hands bound on his back

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Anyone can build 3.2.1 at any given time, it's always supposed to be stable (only fixes go in there).
    Our Enterprise Edition customers will be able to find it in a couple of days to a week. (I think :-))

    After 3.2.1, there's going to be 3.2.2 etc, just like we have 3.0.5 and 3.1.5 branched.
    Last edited by MattCasters; 08-07-2009 at 07:15 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.