Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Generate rows based on period

  1. #1

    Default Generate rows based on period

    Hello all,

    I have an interesting case:

    I'm building a DWH for a car rental company.

    Now, in my fact table I have the following simplified row (The foreign key for date_start and date_end is to the time table):

    pk_rental, fk_date_start, fk_date_end, total_price
    1, 38090, 38093, 400

    For a Datamart I'm looking for a method to transform this record into:

    fk_rental, fk_day, price_per_day
    1, 38090, 100
    1, 38091, 100
    1, 38092, 100
    1, 38093, 100

    Does anyone have a suggestion how to create these rows? I already tried to make a join with the time_table, using a <= and >= operator, but as expected the performance for this query was terrible.

    Any help is very appreciated!

    Greetings,
    Rick
    Last edited by rickonline; 08-13-2009 at 07:11 AM.

  2. #2
    Join Date
    Feb 2009
    Posts
    296

    Default

    I've been doing it. With PDI 3.2 it's really straight forward:

    Read input -> Calc Step (DateA - DateB gets you the number of days) -> Clone Rows (number of days) -> Calc Step (Total price/number of days)

    But beware! The prices won't add up after you've done this. Just imagine a price of 100 and a duration of 3. You can't really sum this up to 100 again.

    We've considered creating correction records and stuff like that but it's getting really messy from there on.
    Fabian,
    doing ETL with his hands bound on his back

  3. #3

    Default

    Thanks! Works like a charm!

    I've been playing with this Clone step, but somehow I totally overlooked the option to use a field for the number of clones. I was fooling around with variables etc.

    Now the most important step - creating the rows - is done, I can focus on myself on determining the day of each specific row and calculating the prices etc.

    Btw: thanks for the heads up on the possible rounding differences! It's noted in the risk log

    Greetings,
    Rick

  4. #4
    Join Date
    Feb 2009
    Posts
    296

    Default

    Calculating the number of days should be simple using the Calculator step. (Use DB Lookup or a simple join in your table input step in order to get the real date. Also some DWHs make sure the date dimension has a continuous surrogate key - maybe you can simply substract your IDs.)

    If you find a nice work around for those rounding differences - please share it
    Fabian,
    doing ETL with his hands bound on his back

  5. #5

    Default

    Actually, the primary key of the Time Table is the number of that day since 1-1-1900. Today, for instance, is day 40038.

    Using this method I can generate the full date of a foreign key to the Time Table, without actually having to look it up in the Time Table. Or the other way around: I can generate the fk to the time table without having to look it up on the fulldate field in the Time Table, which is a performance consuming action!

    As for the rounding differences: for the moment it's staying in that Risk log. Minor differences are not that important... yet.

    Greetings,
    Rick

  6. #6
    Join Date
    Jul 2009
    Posts
    24

    Lightbulb Theory to handle rounding issue.

    After reading through the post it seems there may be a simple resolution to handling the rounding issue or lost penny. Thanks to the clone row step there is a way to identify the original stream via the Add Clone flag option. If you use a switch/case to redirect the original row through a different flow you can easily correct that row by determining the amount lost buy rounding (Total Rental - (Calculated Daily Amount * Number of Days in Rental $100 - ($33.33*3) = $.01 ) You can then add the new calculated field to the original day making it $33.34 ($33.33+$33.33+$33.34 = $100.00) or you can use the clone again to create a $.01 row along side teh $33.33 row. Just a thought please let me know if you see any holes.

    Thank you,
    Jeff

  7. #7
    Join Date
    Feb 2009
    Posts
    296

    Default

    Good thought - I like it!

    Due to analytical reasons I'd try to distribute that missing penny over the whole time and thus I'd introduce a new missing percentage and so an and so forth.
    I'd have to build a loop and stop distibuting the pennies on some kind of condition.

    However I still like the idea and I'll test it to see how much it reduces the error introduces.

    Thanks!
    Fabian,
    doing ETL with his hands bound on his back

  8. #8
    Join Date
    Jul 2009
    Posts
    24

    Question Spreading the Pennies

    The spare penny issue could be approached with some allocations rules if the business wont accept it being on the first or last day. Maybe reach a compromise that it will be split to equal parts based on week or month. I'll put some more thought to the possibility of distributing the remainder to see if I can come up with anything. At some point we should be able to get the remainder small enough that it wont cause issues for the analysis accross periods.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.