Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: DimensionLookup and date_from

  1. #1
    Join Date
    Jan 2010
    Posts
    5

    Default DimensionLookup and date_from

    I'm using PDI 4.0, Windows Vista 64-bit, MySQL 5.1 / InnoDB. I'm running a *.ktr file by running Pan from the command-line (actually spawned from another program). It works great, mostly.

    I have a data table (mytable) with columns (mypk, field1, field2, ...). I ran a PDI DimensionLookup to put this data into a type-2 table, which has columns (PKID, version, date_from, date_to, mypk, field1, field2, ...). This part works great: I end up with records that look like version=1, date_from=1900-01-01, date_to=2199-12-31, and my data. (It's a little weird that date_from is 1900, since my data didn't exist then, but that's no big deal.)

    Then I updated 'mytable' and re-ran the DimensionLookup, and the records that were changed did indeed get updated just fine. The issue I'm seeing is with records that were inserted since last time: I would expect new records in the type-2 table with version=1, date_from=NOW(), date_to=2199-12-31, but I'm seeing new records with date_from=1900-01-01 (and the rest as expected). This means I can't tell what was there the first time, and what was added in this update.)

    I looked at the documentation in the wiki for "Dimension Lookup-Update", and found "Use an alternative start date?". The documentation isn't entirely clear, but it looked like this would do what I want, if I used the "Start date of transformation" option.

    So I tried that, and ended up with it using "1899-12-31 15:00:00" (1900-01-01 minus 9 hours, i.e., in my time zone?) as the boundary. So I think that's closer, but it's still obviously not right.

    Is there some other flag / feature that I'm missing that will do what I want? Is there some issue with "start date of transformation" needing to be explicitly set when running Pan from the command line? Or am I completely misunderstanding how this step is used?

    (Oi, that came out wordier than I thought it would!)

    Thanks!

  2. #2
    Join Date
    Jun 2010
    Posts
    114

    Default

    In the DimensionLookup step,
    In the fields tab,
    Did you check the type of dimention update column?
    It kind of has many options like insert, update ,punch through...and the result depends on the type you have selected.
    Hope that helps to some extent.
    Thanks,
    Om
    ----------------------
    Spoon version -4.4.0

  3. #3
    Join Date
    Jan 2010
    Posts
    5

    Default

    That's a great idea. I think I skipped over those because they seemed field-specific, and I thought what I was looking for was step-specific.

    But I don't think it helps me here. I'd been using "Insert" for everything, which is type-2, which is what I want. (Update and Punch Through update existing records only, and the others appear to only apply to boolean/datetime fields, which is not what I have. I did try Update to confirm that it's not what I'm looking for.)

    In fact, it is type-2'ing my data already. It's just using a funky date_from for new records.

    At this point I'd even be willing to set min_year for each update, except that would limit me to one update per year, which won't work for this project.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.