Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Question on Analytic query step

  1. #1

    Default Purpose of group fields in Analytic query step?

    I'm trying to figure out what the "group field" setting in the Analytic query step does?

    Doc says this:
    "Specify the fields you want to group. Click Get Fields to add all fields from the input stream(s). The step will do no additional sorting, so in addition to the grouping identified (for example CUSTOMER_ID) here you must also have the data sorted (for example ORDER_DATE)."

    However, if I sort my input by order_date (but not customer_id), and then do a group field on customer_id, nothing happens - customer_ids still come in randomly. If I sort by both order and customer, and then skip the grouping, or turn it on, it doesn't seem to influence the output rows.

    Could someone give more details on what this group field is supposed to do?

    Thanks!

    ps: another random query - why is it called analytic query?
    Last edited by kwb; 05-28-2015 at 09:48 AM. Reason: fixing title

  2. #2
    Join Date
    Sep 2007
    Posts
    834

    Default

    This step serves for bringing field values from rows above or below the current row.
    It's not mandatory to specify a group. But if you define a group (made of one or more fields), then the "lead forward / lag backward" operations are made within each group.
    The Analytic Query step doesn't sort the rows. So you have to sort the dataset before they arrive to this Analytic Query step, unless your business rules don't require the data sorted.

  3. #3

    Default

    Quote Originally Posted by Maria Roldan View Post
    This step serves for bringing field values from rows above or below the current row.
    It's not mandatory to specify a group. But if you define a group (made of one or more fields), then the "lead forward / lag backward" operations are made within each group.
    The Analytic Query step doesn't sort the rows. So you have to sort the dataset before they arrive to this Analytic Query step, unless your business rules don't require the data sorted.
    Thanks Maria. I'm afraid I still don't understand what you mean. Could you give a more specific example, of where the "group" is needed (and where it's not)? thanks so much! For example, if I don't group, but I ask for a particular field in the row above me, what happens? How is this different if I group that same field?

  4. #4
    Join Date
    Sep 2007
    Posts
    834

    Default

    Suppose you have this:
    X,Y
    aaa,1
    aaa,2
    aaa,3
    bbb,4
    bbb,5
    bbb,6
    And you want to create a field named Z, with the Y value in the previous row.


    If you only care about the Y field, you don't need to group. And you will have the following result:


    X,Y,Z
    aaa,1,<null>
    aaa,2,1
    aaa,3,2
    bbb,4,3
    bbb,5,4
    bbb,6,5


    But if you don't want to mix the values for aaa and bbb, you can group by the X field, and you will have this:


    X,Y,Z
    aaa,1,<null>
    aaa,2,1
    aaa,3,2
    bbb,4,<null>
    bbb,5,4
    bbb,6,5

  5. #5

    Default

    Ah - ok, I think I understand.

    So the group field in a way groups the results you can access through Lag? So if I ask for a lag, but a new group has just started being sent, I will get NULL. Similarly, if I ask for a lead, and I'm at the end of my current group, I will get a null.

    OTOH, if I don't group, I will get a result from the previous group that came through? Thus, in a way, grouping makes it easy to check if you're in a "new" group - you just check if the lag value is null. Is that close to correct? (Provided of course things are sorted in the proposed order)

    I must admit this was not obvious, but it now makes a bit more sense. Where would I go to submit a patch to the documentation, to explain this a bit better?

    thanks!

  6. #6
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Quote Originally Posted by kwb View Post
    Where would I go to submit a patch to the documentation, to explain this a bit better?
    http://wiki.pentaho.com/display/EAI/Analytic+Query

    It's a Wiki, so create an account and update it!

  7. #7

    Default

    Quote Originally Posted by gutlez View Post
    http://wiki.pentaho.com/display/EAI/Analytic+Query

    It's a Wiki, so create an account and update it!
    Done. --> http://wiki.pentaho.com/display/EAI/Analytic+Query I used Maria's excellent example above. @mariacroldan, please take a look and comment/update it accordingly!

  8. #8
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Quote Originally Posted by kwb View Post
    Awesome, Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.