Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Re: [Mondrian] Eigenbase perforce change 14675 for review

  1. #1
    Luc Boudreau Guest

    Default Re: [Mondrian] Eigenbase perforce change 14675 for review

    The problem appears when doing a count(distinct x) operation. All platforms
    return the null as a distinct value while Greenplum ignores it. I'll double
    check everything and revert if necessary. Thanks for the warning.

    Luc
    On Oct 6, 2011 9:12 PM, "Julian Hyde" <julian (AT) hydromatic (DOT) net> wrote:
    > It's standard behavior that databases don't count nulls. For example,

    given table t with one column c and values null, 1, 2, then 'select count(*)
    from t' will return 3 and 'select count(c) from t' will return 2.
    >
    > Sounds like Greenplum complies with the standard, so you shouldn't need to

    do anything special in the dialect. I think there might be some other
    problem.
    >
    > Julian
    >
    > On Oct 6, 2011, at 5:58 PM, Luc Boudreau wrote:
    >
    >> http://p4web.eigenbase.org/@md=d&c=6PU@/14675?ac=10
    >>
    >> Change 14675 by lucboudreau (AT) luc-mondrian-3 (DOT) 2 on 2011/10/06 17:56:36
    >>
    >> MONDRIAN: Fixes an issue with Greenplum. A null value can be returned by

    the driver, thus creating an NPE when comparing values. Also fixes an issue
    where COUNT() in Greenplum doesn't count null values.
    >>
    >> Affected files ...
    >>
    >> ... //open/mondrian/src/main/mondrian/rolap/SqlMemberSource.java#122 edit
    >> ... //open/mondrian/src/main/mondrian/spi/Dialect.java#27 edit
    >> ... //open/mondrian/src/main/mondrian/spi/impl/GreenplumDialect.java#5

    edit
    >> ... //open/mondrian/src/main/mondrian/spi/impl/JdbcDialectImpl.java#34

    edit
    >>
    >> Differences ...
    >>
    >> ==== //open/mondrian/src/main/mondrian/rolap/SqlMemberSource.java#122

    (ktext) ====
    >>
    >> 2c2
    >> < // $Id:

    //open/mondrian/src/main/mondrian/rolap/SqlMemberSource.java#121 $
    >> ---
    >>> // $Id: //open/mondrian/src/main/mondrian/rolap/SqlMemberSource.java#122

    $
    >> 41c41
    >> < * @version $Id:

    //open/mondrian/src/main/mondrian/rolap/SqlMemberSource.java#121 $
    >> ---
    >>> * @version $Id:

    //open/mondrian/src/main/mondrian/rolap/SqlMemberSource.java#122 $
    >> 136c136
    >> < if (!colStr.equals(colStrings[i])) {
    >> ---
    >>> if (!Util.equals(colStr, colStrings[i])) {

    >> 230,231c230,233
    >> < sqlQuery.addSelect(colDef, null);
    >> < sqlQuery.addOrderBy(colDef, true, false, true);
    >> ---
    >>> final String exp =
    >>> sqlQuery.getDialect().generateCountExpression(colDef);
    >>> sqlQuery.addSelect(exp, null);
    >>> sqlQuery.addOrderBy(exp, true, false, true);

    >> 240c242,244
    >> < sb.append(colDef);
    >> ---
    >>> sb.append(
    >>> sqlQuery.getDialect()
    >>> .generateCountExpression(colDef));

    >>
    >> ==== //open/mondrian/src/main/mondrian/spi/Dialect.java#27 (ktext) ====
    >>
    >> 86c86
    >> < * @version $Id: //open/mondrian/src/main/mondrian/spi/Dialect.java#26 $
    >> ---
    >>> * @version $Id: //open/mondrian/src/main/mondrian/spi/Dialect.java#27 $

    >> 718a719,728
    >>> * Some databases, like Greenplum, don't include nulls as part
    >>> * of the results of a COUNT sql call. This allows dialects
    >>> * to wrap the count expression in something before it is used
    >>> * in the query.
    >>> * @param exp The expression to wrap.
    >>> * @return A valid expression to use for a count operation.
    >>> */
    >>> String generateCountExpression(String exp);
    >>>
    >>> /**

    >>
    >> ==== //open/mondrian/src/main/mondrian/spi/impl/GreenplumDialect.java#5

    (ktext) ====
    >>
    >> 20c20
    >> < * @version $Id:

    //open/mondrian/src/main/mondrian/spi/impl/GreenplumDialect.java#4 $
    >> ---
    >>> * @version $Id:

    //open/mondrian/src/main/mondrian/spi/impl/GreenplumDialect.java#5 $
    >> 76a77,80
    >>> public String generateCountExpression(String exp) {
    >>> return caseWhenElse( exp + " ISNULL", "'0'", "TEXT(" + exp + ")");
    >>> }
    >>>

    >>
    >> ==== //open/mondrian/src/main/mondrian/spi/impl/JdbcDialectImpl.java#34

    (ktext) ====
    >>
    >> 32c32
    >> < * @version $Id:

    //open/mondrian/src/main/mondrian/spi/impl/JdbcDialectImpl.java#33 $
    >> ---
    >>> * @version $Id:

    //open/mondrian/src/main/mondrian/spi/impl/JdbcDialectImpl.java#34 $
    >> 871a872,875
    >>> public String generateCountExpression(String exp) {
    >>> return exp;
    >>> }
    >>>

    >>

    >


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  2. #2
    Julian Hyde Guest

    Default Re: [Mondrian] Eigenbase perforce change 14675 for review

    On Oct 6, 2011, at 6:17 PM, Luc Boudreau wrote:

    > The problem appears when doing a count(distinct x) operation. All platforms return the null as a distinct value while Greenplum ignores it. I'll double check everything and revert if necessary. Thanks for the warning.
    >

    count(distinct x) should eliminate the duplicates but still not count the null values. Thus, if the table has values {null, 1, 2, 2, 2} it should return 2. If it just has value {null} is should return 0.

    (I've implemented the rewrite rule
    select count(distinct x) from t
    --> select count(dx) from (select distinct x as dx from t)
    more than once in my RDBMS-implementing career.)

    I just checked, and MySQL does this. I'm 99.9% certain Oracle does too.

    So... Greenplum is standards-compliant here.

    If there's a specific query where Greenplum is giving a dubious result, send me the SQL and I'll check it out.

    Julian
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  3. #3
    Luc Boudreau Guest

    Default Re: [Mondrian] Eigenbase perforce change 14675 for review

    On MySQL, Mondrian executes this query, which returns 21.

    select count(*) as `c0` from (select distinct `store`.`store_sqft`
    as `c0` from `foodmart`.`store` as `store`) as `init`

    While on Greenplum, Mondrian executes this, which returns 20.

    select count(DISTINCT "store"."store_sqft") as "c0" from "store" as "store"

    Luc


    On Fri, Oct 7, 2011 at 12:23 AM, Julian Hyde <jhyde (AT) pentaho (DOT) com> wrote:
    > On Oct 6, 2011, at 6:17 PM, Luc Boudreau wrote:
    >
    > The problem appears when doing a count(distinct x) operation. All platforms
    > return the null as a distinct value while Greenplum ignores it. I'll double
    > check everything and revert if necessary. Thanks for the warning.
    >
    > count(distinct x) should eliminate the duplicates but still not count the
    > null values. Thus, if the table has values {null, 1, 2, 2, 2} it should
    > return 2. If it just has value {null} is should return 0.
    > (I've implemented the rewrite rule
    > select count(distinct x) from t
    > --> select count(dx) from (select distinct x as dx from t)
    > more than once in my RDBMS-implementing career.)
    > I just checked, and MySQL does this. I'm 99.9% certain Oracle does too.
    > So... Greenplum is standards-compliant here.
    > If there's a specific query where Greenplum is giving a dubious result, send
    > me the SQL and I'll check it out.
    > Julian
    > _______________________________________________
    > Mondrian mailing list
    > Mondrian (AT) pentaho (DOT) org
    > http://lists.pentaho.org/mailman/listinfo/mondrian
    >
    >

    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  4. #4
    Julian Hyde Guest

    Default Re: [Mondrian] Eigenbase perforce change 14675 for review

    > Luc wrote:

    > On MySQL, Mondrian executes this query, which returns 21.
    >
    > select count(*) as `c0` from (select distinct `store`.`store_sqft`
    > as `c0` from `foodmart`.`store` as `store`) as `init`
    >
    > While on Greenplum, Mondrian executes this, which returns 20.
    >
    > select count(DISTINCT "store"."store_sqft") as "c0" from "store" as "store"


    Ha!

    Both databases are giving the right response. The queries are not equivalent. I ran

    > select count(`c0`) as `c1` from (select distinct `store`.`store_sqft`
    > as `c0` from `foodmart`.`store` as `store`) as `init`


    in MySQL and it gives 20 as expected.

    So, why is Mondrian generating different queries? The problem seems to be in SqlMemberSource.makeLevelMemberCountSql. In the name of consistency, it should generate 'count(c)' rather than 'count(*)'. Null values will be ignored, but it will perform better and mondrian will be consistent across databases.

    That method kind of goes about things backwards. It will generate a 'select count ... from subquery' style query even on databases that support compound distinct-count, like MySQL. It could be rewritten to first try to generate a distinct-count query, failing if the database doesn't support distinct-count or if the query is composite and the database doesn't support composite-distinct-count. Then it could try to use the subquery method. Lastly, it could fail.

    How much of this should you do? Depends on how much time you have and how long you think it would take. What you don't implement, please document in a jira case.

    Julian

    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.