US and Worldwide: +1 (866) 660-7555
Page 1 of 4 123 ... LastLast
Results 1 to 10 of 35

Thread: saiku + mondrian using star schema, boolean column showing all results, not unique

  1. #1
    Join Date
    Apr 2011
    Posts
    103

    Default saiku + mondrian using star schema, boolean column showing all results, not unique

    Admittedly, I'm new to dimensional modeling; but I am reading my recently purchased Mondrian in Action book and have built a few fact/dimension tables.

    My fact table has the following dimensions:

    1. external_number_id
    2. internal_number_id

    And the measure:
    1. count (number of calls)

    My exernal_number dimension has columns like:
    1. external_number_id
    2. external_number
    3. city
    4. state
    5. country
    6. isValidNumber (boolean in a postgresql database)

    Using Mondrian 3.61 I built a schema adding isValidNumber as a level to my dimension and when I use it in Saiku (didn't try pivot4J) and place 'isValidNumber' in the filter, it brings back "TRUE" or "FALSE" for every records where I was expecting simply (2) options.

    Any thoughts on what I missed? I see the 'uniqueMembers' check box, but from what I have read, that is not it.

    Many thanks for any leads!

  2. #2
    Join Date
    Apr 2011
    Posts
    103

    Default

    ONE NOTE - if I move the dimension to the ROWS section (not the FILTER section) then it does only show unique values.

    I would still like to understand the difference here, if anyone knows what the heck this rookie is talking about!

  3. #3
    Join Date
    Apr 2011
    Posts
    103

    Default

    Rats..nope...scratch that last comment.

    When isValidNumber is on the Rows by itself, lots and lots of TRUE/FALSE values appear. I had other dimensions added to rows limiting the result set.

    So, in sum, I have an Understanding Fail.

  4. #4
    Join Date
    Nov 2008
    Posts
    738

    Default

    Hmmm...I'm trying to imagine why you want the isValidNumber in another level...and I'm not coming up with anything. Can you help me understand what your plan is there?

    From my experience with levels, I would put things like country, state, and city is successive levels, e.g., going from coarse to fine items that define a location. This helps with grouping and drilling down into your queries.

    Say...would it make sense to put the external_number and isValidNumber fields into the fact table (the external_number would then be known as a degenerate dimension, and isValidNumber would be a measure) and then have a "location" dimension (perhaps with the levels I just described)? In that way your location dimension would potentially be quite a bit smaller than the fact table (in terms of number of rows) because several phone numbers would most likely reference the same location.
    Last edited by darrell.nelson; 12-06-2013 at 05:20 PM. Reason: coarse, of course!
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  5. #5
    Join Date
    Apr 2011
    Posts
    103

    Default

    well, how about that! You just helped me with the ETL job where I was building using the libphonenumber deal. That project was building the dimensions around the numbers and now that I've done that, I've moved onto (trying!) to build the cube to use it.

    i'm new to data-modeling so this 'level' concept (I think) is the learning curve here and the way you worded it makes a good deal of sense to me. It certainly doesn't sound like part of that level. I didn't think about it 'as a level' so much 'as a property of external number.'

    I could do what you said and make isValid part of the fact table; but is there a way to make it work as part of the dimension table? it seems 'very much' like a date-dimension table, to me, where, for example:

    2013-12-06 is a Friday in December.

    So to:

    12024561111 is a VALID FORMATTED number in Washington, DC.

    That said, I may be looking at my dimension table incorrectly. I was (am) definitely thinking about it as 'column properties' not 'levels in a hierarchy.'

    If I wanted to keep it as is (because I really like this external_number dimension. it has lots of uses for lots of other future facts I'll build), would I create a separate hierarchy using isValid as the only level?

    (regarding degenerate dimensions, I was just reading about that in section 3.2.4 of the M.I.A. book and was unsure how to add one using the schema-workbench. I was working with another fact table that had a degenerate dim and i was stuck in this loop where when I add a dimension, you have to add a heirarchy, when you add that, you have to add a level, when you add that, you have to add a table...I couldn't figure it out...totally understand not forking this thread...)

  6. #6
    Join Date
    Nov 2008
    Posts
    738

    Default

    OMG I just had a great idea, knowing what I know about your libphonenumber endeavor!

    However, before I go there please help me understand what the "grain" of your fact table is. In other words, what does each row in your fact table represent? Just one call? If so, is the call count measure always 1 and then aggregated in your queries? If not, what is the grain and where does the count come from?

    The answers to those questions basically boil down to this: What are you getting for source data?
    Last edited by darrell.nelson; 12-06-2013 at 07:17 PM.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  7. #7
    Join Date
    Apr 2011
    Posts
    103

    Default

    For this particular fact, it's callers by day by destination called. So the grain (in this fact table) is one day. (I used Kimball's date dimension for the date_dim table.)

    In other words, to answer your question, specifically for this fact table, the grain is 1 day and so the measures are aggregate values (if the caller called in 15x in a day to the same destination and talked for 30 minutes, then the row would be datekey, external_num_id, internal_num_id, 15, 30).

    There is reason that we may have other facts, though, down to a minute.

    External_Number (in our context) is a very useful independent dimension.

    I'm excited to hear your idea! I am learning a great deal here, just so you know. I really appreciate you helping me "learn to fish" - which I value far more than being handed a fish.

    Again, thank you.

  8. #8
    Join Date
    Nov 2008
    Posts
    738

    Default

    So the grain is one caller to one destination per day. Got it. Is someone aggregating to get that or is that actually what you are given?

    A few more question before I get started (tomorrow):
    1. Could a caller number ever be a destination number?
    2. Vice versa of #1?
    3. Do you basically have identical db tables for the internal and external number dimensions? Are the column names the same in each?
    4. What kind of cardinality are you foreseeing? How many rows in each table, both dimension and fact?
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  9. #9
    Join Date
    Apr 2011
    Posts
    103

    Default

    Q: Is someone aggregating to get that or is that actually what you are given?
    A: I am aggregating it in a transformation. The OLTP db has millions of records and I have a daily job (later, lots of daily jobs) that creates this one fact table.

    Q: Could a caller number ever be a destination number?
    A: It could; but it's unlikely. That is, I do not have any constraint that prevents that from being a possibility in the fact table so it could happen; but unlikely. We could go with no that shouldn't happen.

    Q: Vice versa of above
    A: Same here; highly unlikely but no 'check' to prevent it. I suppose you could, from your desk, call your desk and it would show up in the db.

    Q: Do you basically have identical db tables for the internal and external number dimensions?
    A: Not identical. The internal_number dimension has an extra column for customer (customer_id). Where internal_numbers are unique to a customer, external numbers are not. NOTE - this is not a requirement and, further more, I could always build another table without that column.

    Q: Are the column names the same in each?
    A: No. But same here as above - they could be. I'm the owner of this end to end process so changing things is 'easy' right now in the project (design / testing phase). the external_number dimension is (a) external_num_id, (b) external_num, and (c) external_num_raw (just digits). The internal one has 'internal' in the name and an extra column for customer_id. There is nothing preventing me from making them the same and calling them (a) index (b) num, (c) num_raw and, (d) customer_id - for example.

    Q: What kind of cardinality are you foreseeing? How many rows in each table, both dimension and fact?
    A: Fact table will grow linearly fairly consistently (X per day) and at that daily-aggregation, I'm guessing 100k/day, maybe more. We could get in the millions pretty quickly. The dimension tables, though, will grow exponentially then taper off. They (should) be unique so repeat callers will only be added the first time. It could grow, potentially, to the number of numbers that exist - so it could grow to millions, but i doubt it...100k sounds probably more realistic.

    Hope that helps! Really enjoying the questions and the thought process.

  10. #10
    Join Date
    Nov 2008
    Posts
    738

    Default

    We are both learning something! And here are more follow-up questions:
    Cardinality estimate for the internal_number dimension?
    Same country, state, and city fields for the internals?
    Estimated number of customers?
    Is external_num_raw a string and external_num an integer?
    Do you need both in your cube?
    Are there really that many repeat calls per day?
    Would it matter that much if you put every call in the fact table?
    Won't you have to partition it anyway at some point?
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •