Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Advice needed on Date 'Range' Dimension

  1. #1
    Join Date
    Jul 2011
    Posts
    13

    Default Advice needed on Date 'Range' Dimension

    Apologies for the newbie question, but I really need some advice on Dimension and Fact table structure wrt Date Ranges...

    We have a fact table which stores user sessions. A user session has a start date and an end date. I would like to be able to query an Hour/Quarter-hour granularity Date Dimension to find out the number of 'active' sessions during any quarter hour. A user session may span many hours.

    My first thought was that a user session would result in a single session fact joined to many Quarter-hour Date Dimension records via a join table. But I can't see how to achieve this using Kettle/PDI. I'm also not even sure if this is the OLAP way of handling such a scenario. I can see how to use plain SQL to query such a structure. Is it possible using OLAP?

    I hope someone here can give me advice or point me to some examples/text on how this can be achieved.

    Many thanks,
    Pat

  2. #2
    Join Date
    Nov 2008
    Posts
    271

    Default

    I would suggest two alternatives, but admittedly I have not explored all relevant implications. In the end it depends on your business model design. Anyway, spare a thought around these ideas. Maybe you can come up with a better solution.

    A)
    Use a time-dimension, say at a minute-level granularity, so that it has 60x24=1440 records and store in your fact table the initial and the ending minute, both joined to this dimension (role-playing). A good choice of attributes for this time dimension should allows you to query the fact data quite straightforwardly via sql.

    B)
    As brute as it appears, this scenario is not so different from an inventory model: at the chosen granularity you need a snapshot of a measure (number of active sessions), so you can add a fact table that stores precisely this number, with a primary keys composed, among others, by both date and time id. As any other inventory, this measure would be semi-additive from a time perspective, but you can easily build an olap schema on it. The tough job would be done within your ETL process.

    HTH
    Last edited by Ato; 10-05-2011 at 04:46 AM.
    Andrea Torre
    twitter: @andtorg

    join the community on ##pentaho - a freenode irc channel

  3. #3
    Join Date
    Jul 2011
    Posts
    13

    Default

    Hi Andrea, and many thanks for your ideas.

    I have an added complication in that I have to report on data throughput at these granularities as well.

    A)
    I have to admit I can't work out how to form a SQL query to count the number of sessions occurring in any minute when the facts are joined by start and end to different adjacent minute_dim records :-(
    But as I need to retrieve throughput figures at this granularity as well, I don't think this work - By storing only one session fact, I lose information on throughput. I realise throughput may well be a different fact to session.

    B)
    I think I was heading this way (unless I've completely misunderstood ). My current idea is this:
    Maintain three fact tables, one for each level of granularity:
    - FactDaySessions: Joins to a 'normal' TimeDim consisting of Year/Quarter/Month/Day_of_month
    - FactHourSessions: Joins to an hour Time dimension consisting of up to 24 Hours. The HourDim may join to a parent TimeDim record
    - FactMinuteSessions: Joins to a minute Time dimension consisting of 60 Minutes. The MinuteDim may join to a parent HourDim record

    The ETL would:
    - Create minute facts every minute from raw data
    - Create hour facts every hour from a rollup of minute facts
    - Create day facts every day from a rollup of hour facts

    I should be able to query active sessions from:
    - the previous day backwards at day granularity
    - the previous hour backwards at hour granularity
    - the previous minute backwards at minute granularity
    If they want to know active sessions from 'now', we can provide it, but it would be more complex.

    I can also choose how much history I keep from the higher grain facts.

    Does this make sense? Does it seem sensible? Or have I lost the plot?

    Many thanks again for your time. I really appreciate it.
    Pat
    Last edited by itchyknee; 10-05-2011 at 06:48 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.