Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Kettle (PDI) Group by

  1. #1

    Default Kettle (PDI) Group by

    Hi every body,
    I couldn't succeed to do a group by with kettle. I have the following error:

    Group by 3.0 - ERROR (version 2.5.0, build 25002 from 2007/05/04 00:20:04) : Aggregate subject field [] couldn't be found!

    but the problem is that i don't know what the field "subject" means and i let it empty.

    can any one help me

    thanks in advance
    hamma

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    There are examples using goup by in the example directory of your kettle installation... look at those first. Subject are the fields you wan to do your group by operation on.

    Regards,
    Sven

  3. #3

    Default

    I tried to understand the samples but i couldn't suceed. I want just to do a simple count(*) and generate a new field (cnt_ammissioni_pres) for each group. So the calculation is not made on fields but the number of rows for each group. my query which does the same is the following:

    /********* Load staging table ammissioni_pres *************/
    select avce_d_comune.desc_comune, avce_d_provincia.desc_provincia, avce_t_atcca_scheda.comune, avce_t_atcca_scheda.sigla_prov, stagione, count(*) as cnt_ammissioni_pres
    from avce_t_atcca_ammesso, avce_t_anagrafica, avce_t_atcca_scheda, avce_d_comune,avce_t_atcca, avce_d_provincia
    where avce_t_atcca.atcca=avce_t_atcca_ammesso.atcca
    and avce_t_atcca.atcca=avce_t_atcca_scheda.atcca
    and avce_t_anagrafica.id_cacciatore=avce_t_atcca_ammesso.id_cacciatore
    and avce_t_anagrafica.res_comune=avce_d_comune.istat_comune
    and avce_d_comune.istat_provincia=avce_d_provincia.istat_provincia
    and (stato_ammissione='1' or stato_ammissione='0')
    group by avce_t_atcca_ammesso.stagione,avce_t_atcca_scheda.comune, avce_d_comune.desc_comune,
    avce_d_provincia.desc_provincia, avce_t_atcca_scheda.sigla_prov

    I'm confused, what i did is:
    - put the group by fields in "Group field"
    - put the new field (name:cnt_ammissioni_pres, subject: i let it empty, type: number of values).
    what i thought is that "subject" is useful in the case we will do calculations like (sum, max) based on one field from the stream input. but in my case just count(*)

    Thanks in advance
    hamma

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The fields that make up the group:
    - avce_d_comune.desc_comune
    - avce_d_provincia.desc_provincia
    - avce_t_atcca_scheda.comune
    - avce_t_atcca_scheda.sigla_prov
    - stagione

    Aggregate:
    Name : cnt_ammissioni_pres
    Subject : any field, for example "avce_d_comune.desc_comune"
    Type: Number of Values

    HTH,

    Matt

  5. #5

    Default

    Thanks for the quick response,
    I did exactly what you said but i still have a problem. It seems that the aggregation is not done correctly because it sends some duplicated rows for the group by fields although that before the group by i do a sort on the group fields.

    The error is the following:

    Table output 3.0 - ERROR (version 2.5.0, build 25002 from 2007/05/04 00:20:04) : Because of an error, this step can't continue:
    2007/09/26 10:55:12 - Table output 3.0 - ERROR (version 2.5.0, build 25002 from 2007/05/04 00:20:04) : Error batch inserting rows into table [stagingtableammissioni_pres]
    2007/09/26 10:55:12 - Table output 3.0 - ERROR (version 2.5.0, build 25002 from 2007/05/04 00:20:04) :
    2007/09/26 10:55:12 - Table output 3.0 - ERROR (version 2.5.0, build 25002 from 2007/05/04 00:20:04) : Error updating batch
    2007/09/26 10:55:12 - Table output 3.0 - ERROR (version 2.5.0, build 25002 from 2007/05/04 00:20:04) : Duplicate entry 'ACQUI TERME-ALESSANDRIA-ACQUI TERME-AL-2004' for key 1
    2007/09/26 10:55:12 - Sort rows 3.0 - Finished processing (I=0, O=0, R=194244, W=556, U=0, E=0)
    2007/09/26 10:55:12 - Group by 3.0 - Finished processing (I=0, O=0, R=550, W=321, U=0, E=0)
    2007/09/26 10:55:12 - Table output 3.0 - Finished processing (I=0, O=99, R=100, W=0, U=0, E=1)
    2007/09/26 10:55:12 - Spoon - The transformation has finished!!
    Last edited by hammamr; 09-26-2007 at 05:08 AM.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Obviously, unless your unique index on the target table contains all the field of the Group, it's still possible to get duplicates on a single field.
    Make sure you didn't enable the "pass all rows" function.

    Matt

  7. #7

    Default

    Thanks a lot i got it, now my transformation is working well. The problem was that "Include all rows was enabled" i disabled it as you said and now it's working well.

    hamma

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.