Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: 'boiling down' data

  1. #1
    Join Date
    Aug 2006
    Posts
    14

    Question 'boiling down' data

    I have MANY rows of data that I wish to reduce in the following way...

    Assume I have 100 rows of X,Y data and need to reduce it down to 10.

    + Take the first row's X value as the resultant X value
    + The resultant Y value should be calculated as the average of this row's Y value and the following 9 rows' Y values
    + output the result as the X,Y

    Repeat until the stream of data is no more.

    The question is: how to do this with Kettle/Spoon?

    The idea is that I then feed this reduced stream into Pentaho for reporting/charting.

    Looking for suggestions...everything gratefully accepted.

    Cheers,

    Alph
    PS: like the way the title "goes with the food/lkitchen flow"? :-)

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    The only way you're going to get exactly what you want is with javascript step or writing your own.

    Regards,
    Sven

  3. #3
    Join Date
    Aug 2006
    Posts
    14

    Default thanks

    I had anticipated the need for a Javascript step...something like:

    if (current row is one of the 'extra' rows)
    {
    store it away for later
    'ignore' the row...throw it away
    }
    else
    {
    process the stored rows and the current row
    create and issue a new row based on the processing
    }

    the two issues I had identified were:

    + how to delete/ignore a row
    + how to create a row

    Apologies...I should have been clearer in my original post.

    Alph

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    Ignoring a row in javascript is not possible right now (3.0.0-RC1)... http://jira.pentaho.org/browse/PDI-229

    Making rows... for 2.5 it's in the FAQ, for 3.0 we may still need to change and possibly make an example transformation. It's also one of those things that keep coming back

    Regards,
    Sven

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Actually, let's forget about JS for a while.

    Suppose you would number the rows and split them in groups of 10.
    Then you can simply perform a Group By step and you're done.

    General sample - first and average coordinates calculation.ktr

    Name:  xy-group-by-sample.png
Views: 34
Size:  15.5 KB

    HTH,

    Matt

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Making rows... for 2.5 it's in the FAQ, for 3.0 we may still need to change and possibly make an example transformation. It's also one of those things that keep coming back
    Sven, look no further:

    http://wiki.pentaho.org/display/EAI/...2.5.x+to+3.0.0

    Personally I think that the 3.0 way of doing it is a lot better.

    ;-)

    Matt

  7. #7
    Join Date
    Aug 2006
    Posts
    14

    Default

    MANY thanks for that example.

    I am amazed and horrified both at the same time!

    Amazed: both by your speed of reply and the evident power of the tool
    Horrified: by the weirdness of the solution...NEVER would have thought of anything even remotely like it 8-(

    Again thanks...it IS much appreciated.

    Cheers,


    Alph

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    It's more of an analytical example, not very often found in your average ETL solution.
    I would love to see more possibilities in the future for allowing us to create/tag groups of records.
    Evidently that's what the "Group by" step runs on.

    Rewording your question:

    Assume I have 100 rows of X,Y data and need to reduce it down to 10.

    + Take the first row's X value as the resultant X value
    + The resultant Y value should be calculated as the average of this row's Y value and the following 9 rows' Y values
    + output the result as the X,Y
    Into:

    Assume I have 100 rows of X,Y data and need to reduce it down to 10.

    + Take the first row's X value of each group of ten records as the resultant X value
    + The resultant Y value should be calculated as the average of the Y values in this group of ten records
    + output the result as the X,Y
    That should give you insight into why I picked the group by solution.

    All the best,

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.