Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Bugs,bugs... Aggregate tables.

  1. #1
    Join Date
    Oct 2007
    Posts
    9

    Default Bugs,bugs... Aggregate tables.

    I've made something that kinda works using Mondrian and I have MANY objections for how Aggregate tables work.

    1. Mondrian gives nice advices about what aggregate tables should be created. But these advices are very incorrect and require a lot of hand work.
    * if there are TWO aggregates over one column (say SUM and AVG are used in one cube as measures), the generated SQL looks like this :

    CREATE TABLE agg_l_XXX_first_hits (
    id INT NOT NULL,
    my_value INT NOT NULL,
    my_value INT NOT NULL,
    fact_count INTEGER NOT NULL
    ); // duplicate column names

    * INSERT statements uses strange mix of upper and lower case, which is extremely difficult to fix.

    2. I have (somewhere) a dimension on table "allcategories"
    named "Categories". I also have a dimension in cube "orders" on table "models" named "Order category". That's all - aggregate tables on "orders" table does not work. It says that it can not find reference to "allcategories" table. But it is not present in "orders" cube ! When I changed a NAME (not table!!!) of a dimension, everything worked. It looks like Mondrian simply tries to match everything to everything.

    3. If aggregate tables are not mentioned in the schema definition, they are still being used. But if they are mentioned BEFORE cubes, Mondrian crushes.

  2. #2
    Join Date
    Oct 2007
    Posts
    9

    Default

    4. If avg is in measure list, it is treated as follows :
    Mondrian suggests creating aggregate column for it, like
    AVG(my_measure) and then generates the following SQL stmt

    select
    sum(`agg_l_002_first`.`id`) as `m0`,

    sum(`agg_l_002_first`.`my_measure_avg`) / sum(`agg_l_002_first`.`fact_count`) as `m1`,

    sum(`agg_l_002_first`.`my_measure_sum`) as `m2`

    from `agg_l_002_first` as `agg_l_002_first`

    Thus, AVG column in fact contains AVG/COUNT.

  3. #3
    Join Date
    Oct 2007
    Posts
    25

    Default

    1. just wrote a fix for this same problem.. check the Mondrian Mailing list here on Pentaho
    2. also added a fix for using agg_tables.. see same list..

    hope these help...
    Ati Rosselet

  4. #4
    Join Date
    Oct 2007
    Posts
    25

    Default

    3.
    usage of aggregate tables is automatic if the mondiran properties are set to

    mondrian.rolap.aggregates.Use=true
    mondrian.rolap.aggregates.Read=true

    and seeing as you are automatically trying to generate the aggtables, that means that

    mondrian.rolap.aggregates.generateSql=true

    is also set.

    AND.. I was not aware that you could define aggtables outside of a cube.. that sounds like something you shouldn't try to do (correct me if I'm wrong). Check the dtd to see if it is even allowed.. if not.. there's the problem. If it is allowed.. it probably shouldn't be
    Last edited by rosselet; 10-25-2007 at 07:36 AM.

  5. #5
    Join Date
    Oct 2007
    Posts
    9

    Default

    Thank you, rosselet, but it did not help. Field names are in lowercase now, thank you, but some aggregate tables still cause
    Rolap.Column not found (null) for tableAlias=allcategories, factColumnName=id, levelColumnName=id, symbolicName=Category
    error

    while other aggtegate tables on the same cube work fine.

  6. #6
    Join Date
    Oct 2007
    Posts
    9

    Default

    Quote Originally Posted by rosselet View Post
    I was not aware that you could define aggtables outside of a cube..
    My mistake. I wrote "before cubes", should be "before dimensions". The example on Mondrian page made me to beleive that <AggName> should appear just after <Table>, but it is incorrect. It should be placed at the end of the <Cube>, after all <Dimension> and <Measure> tags.
    I do not think that it is a bug, but this should be clearly explained in documentation.

  7. #7
    Join Date
    Oct 2007
    Posts
    9

    Default

    Looking deeper into the source code I found the possible reason of the problem. The 'normal' table involved into agg matching looks like this (table.toString() used):

    alias=cities
    relation=cities
    Columns:
    Region (3): `cities`.`region_name`
    Region (Key) (4): `cities`.`region`
    City (5): `cities`.`name`
    Condition:
    left=`sessions`.`location_id`
    right=`cities`.`id`
    region

    but the parnt-child table has no columns :

    alias=allcategories
    relation=allcategories
    Columns:
    Condition:
    left=`categories_closure`.`parent_id`
    right=`allcategories`.`id`
    id

    That produces an error each time aggreate is used on this dimension.

  8. #8
    Join Date
    Oct 2007
    Posts
    25

    Default

    could you post the schema section for this dimension? Also... where is this errpr being thrown? As for specification of AggTables, Foodmart.xml has <Tables> defind right after the <Cube> start.. and no error is thrown. What error are you getting?
    Cheers
    Ati

  9. #9
    Join Date
    Oct 2007
    Posts
    9

    Default

    <Cube name="First hits">
    <Table name="first_hits"/>
    <Dimension name="Categories" foreignKey="category_id">
    <Hierarchy hasAll="true" allMemberName="All Categories" primaryKey="category_id">
    <Table name="allcategories"/>
    <Level name="Category" uniqueMembers="true" type="Numeric"
    column="id" nameColumn="category_name"
    parentColumn="parent_id" nullParentValue="0">
    <Closure parentColumn="parent_id" childColumn="id">
    <Table name="categories_closure"/>
    </Closure>
    </Level>
    </Hierarchy>
    </Dimension>
    <DimensionUsage name="TimeByWeek" source="TimeByWeek" foreignKey="hit_date_id"/>
    <DimensionUsage name="TimeByMonth" source="TimeByMonth" foreignKey="hit_date_id"/>
    <DimensionUsage name="Agent" source="Agent" foreignKey="agent_id"/>
    <DimensionUsage name="Location" source="Location" foreignKey="location_id"/>
    <DimensionUsage name="Referer" source="Referer" foreignKey="referer_id"/>
    <Measure name="Sessions count" column="id" aggregator="count" datatype="Integer" formatString="#,###"/>
    <Measure name="Average hits" column="hits_number" aggregator="avg" datatype="Numeric" formatString="#,###"/>
    <Measure name="Total hits" column="hits_number" aggregator="sum" datatype="Integer" formatString="#,###"/>
    </Cube>


    the aggregate table layout is :

    CREATE TABLE `003stat`.`gg_c_102_entry_hits` (
    `parent_id` int(10) unsigned NOT NULL,
    `referer_type` enum('mail','blog','search','typein','partner') NOT NULL,
    `id` int(11) NOT NULL,
    `hits_number_avg` int(11) NOT NULL,
    `hits_number_sum` int(11) NOT NULL,
    `fact_count` int(11) NOT NULL
    );

    If this table is present, the system crashes with error
    Rolap.Column not found (null) for tableAlias=allcategories, factColumnName=id, levelColumnName=id, symbolicName=Category
    regardless of which cube is being used.

  10. #10
    Join Date
    Oct 2007
    Posts
    25

    Default

    Your table "allcategories" should have a column category_id. this error is being thrown because the fact table is being linked to the dimension via category_id, then you are referencing an "id" column in the aggregate table, which is not part of the dimension/aggregate... (the error means - "I've looked everywhere and can't find the column called "id" that you are referencing!"... now... if I read this correctly, you should replace "id" with "category_id" in the LevelName entry.

    Hope this helps (also hope I'm getting this right)
    Cheers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.