US and Worldwide: +1 (866) 660-7555
Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 35

Thread: saiku + mondrian using star schema, boolean column showing all results, not unique

  1. #21
    Join Date
    Apr 2011
    Posts
    122

    Default

    Holy crap, yea, I DO remember the start of this and totally missed that (fail...) 2x4x11 bit - which I DO understand now though! BAH! So sorry I missed that and forced you to repeat yourself there...

    I greatly appreciate the conversation. I feel like I jumped a substantial leap forward from this, measures more than I could quantify from a book reading and some web video tutorials!!

  2. #22
    Join Date
    Nov 2008
    Posts
    777

    Default

    No problem...it's Monday morning...and I don't mind repeating myself, especially after I've led you around in circles. And I will do it again... Now.

    Please remember one of my guiding principles: "Don't optimize until you need to." I repeat this one because I think it makes sense. Basically, it's a specific instance of the "just in time" principle. You've chosen to optimize your fact table by aggregating (because of the expectation of excessive size, which I understand) and I've convinced you to optimize your external number dimension by creating a sub-dimension (again because of the expectation of excessive size). Given those, I wouldn't worry about any further optimization in your application until it's shown that you need to. Optimizations take time to plan and execute so you shouldn't let them stall your project and delay the delivery of some very useful analytics.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  3. #23
    Join Date
    Apr 2008
    Posts
    2,474

    Default

    Hey Darrell,

    Might I also suggest making the customer table be a SCD, rather than a regular dimension?
    Customers have a bad habit of changing names, and addresses, etc...
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  4. #24
    Join Date
    Nov 2008
    Posts
    777

    Default

    You certainly may and it's a good one at that! I somewhat suggested it in my reply to loganseth's quoting of two cases for a snowflake schema, with the second case being the desire to separate type I and type II properties. However I've rambled on so much here that I'm very glad you chimed in to emphasize the not-so-obvious need for handling changes to customer info. Thanks.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  5. #25
    Join Date
    Apr 2011
    Posts
    122

    Default

    I will be reading the M.I.A. section on SCDs this week...

    So I built the 2x4x11 dim_external_number_flags dim and then was trying to figure out how to insert that column in the dim_external_number table. The only way I 'knew how' was to do a MERGE JOIN to the running columns (the ones built from the javascript step) to the dim_external_number_flags column (on all columns, except the external_id one). This worked and I could then 'select values' on the final output to just insert the external_flags_id value into the dimension;

    but is there a better way to insert the external_flags_id into the external_number dimension (at this point, it is snowflaked)?

    UPDATE ON THE CITY-STATE data...the 'location' value that comes back from the geocoder is 'hit or miss' at this point in the project. Sometimes it returns a STATE (Georgia, for example) and other times a city/state (Atlanta, GA) while internationally it could be city or province or country. It's (currently) something I think is more complicated then I want to manage 'at this time' (effectively breaking up COUNTRY / STATE or PROVINCE / CITY, that is).

    And, @darrell.nelson, thank you for the recent optimization comment - that is, in moving forward (agile development), optimize when required. I think the external_number dimension is going to hit 500k (soon) so I do, also, think that was time well spent in up-front design.

    Good times!!

  6. #26
    Join Date
    Apr 2008
    Posts
    2,474

    Default

    Have a peek at the Combination Lookup/Update step.
    Its role is to take a combination of columns in the stream, look them up in a table, and replace them with the key value from that table.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  7. #27
    Join Date
    Nov 2008
    Posts
    777

    Default

    Quote Originally Posted by loganseth View Post
    So I built the 2x4x11 dim_external_number_flags dim...

    but is there a better way to insert the external_flags_id into the external_number dimension (at this point, it is snowflaked)?
    You jumped the gun on this one! There's no need to prebuild the entire 2x4x11 dimension. As gutlez recommended, the Combination Lookup/Update step does exactly that, however, it does it on the fly so it only adds the specific combinations of flags as they are needed. To me that's very efficient.

    Take a look at my sample transformation again because I've expanded it to include the steps to lookup/update the external flags dimension and then lookup/update the external numbers dimension. You'll have to plug in your own database connection and scratch tables to run it of course, but you should be able to see how I pass the external flags key into the external numbers dimension by looking at the step configurations.

    Quote Originally Posted by loganseth View Post
    UPDATE ON THE CITY-STATE data...the 'location' value that comes back from the geocoder is 'hit or miss' at this point in the project. Sometimes it returns a STATE (Georgia, for example) and other times a city/state (Atlanta, GA) while internationally it could be city or province or country. It's (currently) something I think is more complicated then I want to manage 'at this time' (effectively breaking up COUNTRY / STATE or PROVINCE / CITY, that is).
    Yeah, I think I mentioned (in the PDI forum) that this was going to be a hurdle to get over. I've got an idea that might just work but another question first: Do you have a lot of international callers? I'd like to get a feel for what countries are going to be involved.
    Attached Files Attached Files
    Last edited by darrell.nelson; 12-11-2013 at 04:22 PM. Reason: Nice tweeks to the JavaScript and Combo L/U steps in the .ktr file
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  8. #28
    Join Date
    Apr 2011
    Posts
    122

    Default

    Hello. I wanted to provide an update to this; as today I was able to get back to this data model.

    THANK YOU (@darrell.nelson) for so many great tips, leads, and samples! Man! Made SO MUCH sense to me what you meant about 'jumping the gun' on pre-building the dim_external_flags table, now that I understand what the 'combination lookup/update' step does! I build the fk_external_number transformation that uses that step to lookup (and build! so cool...) the dim_external_callers and dim_internal_callers tables; then have a separate transformation using the combination lookup/update step to add the remaining fields to the dim_external_callers table and build the dim_external_flags table (adding the external_flags_id value into the dim_external_callers) table.

    solid!

    I spent a great deal of time trying to get the 'dimension lookup/update' step working but since I'm not using a SCD (I think that was the issue) I wasn't able to get it working. That is, literally, I never able to get it to run trying several different combinations. But the combination lookup/update worked (or, at least, seems to be working!) for all of the transformations.

    The next step, then, is going to be using Mondrian to add the snowflake 'arm' to the cube (for the dim_external_flags dimension!).

    This is really great stuff, Guys. It (obviously) makes a lot of sense (looking back now) that PDI has the ability to maintain a dimension table while looking up key values.

    Brilliant!

    I am going to try and build the Mondria Schema tomorrow (hopefully) so we shall see how that works out!

    Thank you, again.

  9. #29
    Join Date
    Apr 2011
    Posts
    122

    Default

    Good news is I was able to add the snowflake dimension (found another post referencing the foodmart.xml as a an example) but the bad news is that the VALID/INVALID string is showing up multiple times (still!).

    That is, the issue that I originally posted (boolean column showing all results, not unique) is still TRUE (haha...boolean, true...ok, ok, that was lame).

    For brevity, the design is as follows (and the schema is attached):

    1. fk_callers_by_day has external_number_id in a column which references
    2. dim_external_numbers which has external_flags_id which references
    3. dim_external_flags (snowflake)
    4. isvalid (boolean) is a column in this table.

    What I would like to do is to toss this column (isvalid) in the filter (of a Saiku analytics) and it display the only options (VALID/INVALID).

    datamart02.xml

  10. #30
    Join Date
    Nov 2008
    Posts
    777

    Default

    Glad to hear you are making progress! I'm very surprised you can't seem to get the Dimension lookup/update to work. Every dimension table should be built with that step except the flags table. That should be the only one built with the Combination lookup/update step. Did you have a look at my latest sample transformation? It uses one C l/u and one D l/u step. The fact that you are not creating SCDs (type II "Insert" setting) shouldn't matter much. Mine uses the "Update" setting for each attribute field but that can be changed with a "flick of the switch". Specifically, what issues are you having with that? We really need to get that working for you.

    Now on to the cube schema... I can't open your attached file at the moment but I will. To help me understand the valid/invalid issue though, please describe what you expect to see. What is ON ROWS? What is ON COLUMNS? And what measures do you want to see at their intersections? Basically, what business question are you trying to answer with the isValidNumber attribute? Those MDX queries can be hard to get the hang of at the start because they differ so much from SQL queries. I normally like to get some basic queries working in Schema Workbench before I move on to something better but I must admit I haven't used Saiku very much.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •