Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Building a data warehouse

  1. #1
    Join Date
    Jun 2007
    Posts
    128

    Default Building a data warehouse

    Hi,

    I need some help in creating dimensions and facts for call data.Actually i have call data which contains callingnumber,called number,time,duration,location etc.
    I have stored all this data in a staging table.
    Even though I went through documents related to building a warehouse and creating dimensions and facts i am confused alot in my scenario.
    I have a doubt whether it makes sense if we create dimension and fact tables from a single table.


    Thanks
    Sreelatha

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    There are no easy answers for that Sreelatha, else all datawarehouse consultants would be out of work.

    Normally people create facts and dimensions out of reporting needs (and I must admit, some just out of habit ). If you don't have "dimensional data" in your 1 table it doesn't make sense of course to make it into facts and dimensions.

    If you're talking CDR data I would recommend a star-schema for sure, but maybe not from 1 table. It's probably your job to figure out which reports the end-users wants, which data you need in the database, which final database setup you need.

    Regards,
    Sven

  3. #3
    Join Date
    Jun 2007
    Posts
    128

    Default

    Quote Originally Posted by sboden View Post
    There are no easy answers for that Sreelatha, else all datawarehouse consultants would be out of work.

    Normally people create facts and dimensions out of reporting needs (and I must admit, some just out of habit ). If you don't have "dimensional data" in your 1 table it doesn't make sense of course to make it into facts and dimensions.

    If you're talking CDR data I would recommend a star-schema for sure, but maybe not from 1 table. It's probably your job to figure out which reports the end-users wants, which data you need in the database, which final database setup you need.

    Regards,
    Sven
    Hi,

    Thanks for the reply.I am talking about CDR data only.Eevn I planned to use star schema only.
    I am getting data in text files and I am loading that data from text files into a single mysql table.
    I am building my dimensions and facts from that single table.
    Here only i got confused.
    R u saying that we should not store it into a single table before creating dimensions and facts?Is it so?


    Thanks
    Sreelatha

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    R u saying that we should not store it into a single table before creating dimensions and facts?Is it so?
    Didn't say that... staging is usually a good thing... You mention you only had one table, from the columns of the table you mentioned above I don't see anything which would fit a real dimension... customer names, accounts, ... I think you need more stuff than your single table to make a warehouse.

    Regards,
    Sven

  5. #5
    Join Date
    Jun 2007
    Posts
    128

    Default

    OK.Thanks
    I have derived some fields from the given data and I have around 45 fields in my table and there is a possibility to create dimensions and facts from those fields.
    Thank you very much for your your quick reply.The only doubt I got here building all dimensions and facts using the single staging table is feasible or not.Now my problem is resolved with your help.

    Thanks
    Sreelatha

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    You will have some more problems, CDRs are usually not that easy... Most of the problems in CDR are related to volume and proper archiving of data.

    But most important (for any warehousing project) is to find out what your end-users need to be able to get the reports they want.

    Regards,
    Sven

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Actually, I used to think that staging was for the sissies, but I changed my mind a few years ago.
    Think of it as an opportunity to create an archive of the input data.

    Here's the deal: if you create your staging tables with a batch ID in them, you can make a history.
    Keep all the data of all the source system data.

    This will allow you to re-create the complete data warehouse from scratch at any time.
    Especially during development/test/acceptance cycles, this is a very interesting option.

    CDR are at first glance simple, but in certain situations (very long duration calls, etc) they can be a serious pain.
    Not only the sheer volume is a problem, but the absolutely needed aggregation is a problem because it can be very tricky to do that incrementally.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.