Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Best practices to create OLAP Cubes for historical data

  1. #1
    Join Date
    Oct 2017
    Posts
    7

    Default Best practices to create OLAP Cubes for historical data

    Hello everyone.

    Currently I am facing the problem regarding historical data & performance.

    I have as input two tables: ticket and ticket history. One ticket can have more than 50 entries of history data. History data includes time, status, priority, responsible_person_id, etc.

    As dimensions I have: time, status, employee, priority, etc.
    I have no idea what to make as a fact table.

    So far I have tried the following:

    ticket_history:


    ticket_id time status priority responsible
    1 20170101 new 1 1
    1 20170601 in progress 1 1
    1 20171201 closed 1 1


    As a fact table I create for each ticket_history and for each day the entry like:


    ticket_id time status priority responsible
    1 20170101 new 1 1
    1 20170102 new 1 1
    1 20170103 new 1 1
    ...
    1 20170531 new 1 1
    1 20170601 in progress 1 1
    1 20170602 in progress 1 1
    ...
    1 20171201 closed 1 1

    As you can see I am just extending the state of the ticket for all days. So far it can work but in my case I need the hours as well.
    I want to know how many tickets have the status open between 2017-01-01 09:00 and 2017-01-01 18:00.

    If I extend the fact table with hours I have a big problem with performance.

    Any other suggestions, please?
    Thanks a lot in advance!

  2. #2
    Join Date
    Nov 2013
    Posts
    382

    Default

    If you extend the state for each day, selecting a range of days when accessing the cube data will add the rows.

    In your example, assuming there is a field with count as the aggregate method, selecting a period 1/1/2017 to 3/1/2017 and new as state will give you 3 as an answer ... and I bet this is not at all what you want.

    Just create a cube row for each diferent combination ticket/status. Selecting a date range will give you the desired value.

    So in fact your problem is how to create the time dimension including hours. There is a bunch of literature about that. It's nothing related to PDI, your problem is a cube dimension definition one.

    Just googling ... sure there are better (more recent!) ones
    https://www.codeproject.com/Tips/642...th-Hourplus-Va

  3. #3
    Join Date
    Oct 2017
    Posts
    7

    Default

    Thanks a lot for your respose DepButi.

    ...count as the aggregate method, selecting a period 1/1/2017 to 3/1/2017 and new as state will give you 3 as an answer
    In my case I am using DISTINCT count on ticket_id column as aggregate method. That's why for one ticket it will be counted only once.

    Just create a cube row for each diferent combination ticket/status. Selecting a date range will give you the desired value.
    But, if I understand it correctly, if the changes were made on 01/01/2017 and on 01/06/2017 and if I select the period of time between 01/02/2017 and 01/03/2017 there will be no results, or?

    Thanks.

  4. #4
    Join Date
    Nov 2013
    Posts
    382

    Default

    Quote Originally Posted by laziz View Post
    But, if I understand it correctly, if the changes were made on 01/01/2017 and on 01/06/2017 and if I select the period of time between 01/02/2017 and 01/03/2017 there will be no results, or?
    Usually fact tables collect "facts that happen on a specific date", not "the situation on that specific date". To obtain this you will have to use a calculated member adding to date. YTD for example if you want data from the current year, or specifying all period up to date selected ...

  5. #5
    Join Date
    Oct 2017
    Posts
    7

    Default

    I can slightly understand what you mean. Should I do it for fact table while creating a Mondrian Schema?

    Could you please give an example or to explain a bit more. Or at least give a hit in which direction should I go.

    Thanks a lot in advance.

  6. #6
    Join Date
    Nov 2013
    Posts
    382

    Default

    This should be discused in an OLAP forum, not PDI. It depends on your cube definition, which exact structure you are using, dimensions etc ... but just as an example (SSAS OLAP multidimensional cube)

    create MEMBER CURRENTCUBE.[Measures].[YearSales]
    AS sum(ytd([Time].[Month].currentmember),[Measures].[Sales]) -- Assuming a Time dimension with several hierarchies: Month for normal dates, Weeks for ISO weeks/day of week, etc
    FORMAT_STRING = "#######,##0\ \€",
    VISIBLE = 1;

    Selecting a date, you will have on the Sales column the sales of that specific date, on the YearSales the accumulated sales of the year selected from 1/1 to the date selected.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.