Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Database input using grouped rows

  1. #1

    Default Database input using grouped rows

    I would like to get a list of measurements that match a list of tests I have already processed.

    I want something Equivalent to:
    select * from measurementresponses where testResponse in (select objid from tblworkingtestdata)
    I would like to avoid the sub query as I have already extracted the tests, instead I would like to use the group by transformation to create a comma separated list of id's and pass in the list to database input step to get the measurements.

    Unfortunately it isn't working as I would hope, using the group step I only get the measurements that match the first test. The transformation works as expected if i don't use the grouping and execute the db input transformation for each input row but that seems inefficient because i'm making multiple db calls.

    Can what I want to do be done, or am i stuck with sub queries or executing the db input for each row?
    Attached Files Attached Files

  2. #2
    DEinspanjer Guest


    If your measurementresponses and tblworkingtestdata tables are on the same DB, you should use a normal Table Input step that has a JOIN clause in it (instead of the subquery). This is the most efficient way to do this.

    If you were reading the values from tblworkingtestdata in from a file or you were performing other manipulations on the data before it is ready to be used as lookup, I see two options:

    1. Use a Stream Lookup step. The downside here is that you will be returning *all* records from measurementresponses and then filtering out the ones that don't match. If you have many millions of rows in that table, this option wouldn't work well.

    2. To get your non-working code to function, you need a Kettle variable rather than a JDBC parameter. Standard JDBC parameters can only be scalar values which is why your IN (?) part isn't working. Move the lookup and group by steps into a new transformation and save the results of the group by step to a variable using the "Set Variables" step. Then have a job that executes this new transformation first and then your main transformation using that variable inside your IN clause "where testResponse IN (${MYLIST})" (don't forget to enable variables in the Table Input step).

  3. #3


    Using a variable is a viable option (as I am processing the tests in a different transformation anyway), I should have tried it but I was worried that the list of values would be too big to store in a variable.

    Is there a limit to the size of a string that can be stored as a variable?


  4. #4
    DEinspanjer Guest


    Not really. The only limits would be the amount of memory you have and any possible limitations the JDBC driver might have for the size of a SQL statement.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.