Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: building a loop to compute frequencies on table_column values

  1. #1

    Default building a loop to compute frequencies on table_column values

    I'm to trying to build a process that insert the frequencies (distinct counts) of a series of database columns of a table into a frequency table.

    Step 1 (transformation1) : Read datacolumn names from admin view to select the set of columns of datatype integer

    Step 2 (tranformation 2) : I execute transformation 2 for each incoming row from transformation 1. The value fed into transformation 1 is just the name of the column.

    T2 uses a get rows from result and sends the result to a SQL script:

    insert into reg_frequencies
    select ? ,count(distinct pk_id) from frequency group by ?;

    So, if the column would be 'c1', I would expect Kettle to build the SQL :

    insert into reg_frequencies
    select c1 ,count(pk_id) from frequency group by c1;

    when I use the field_name in 'Fieldname to be used as an argument'

    The error log shows that ? is not substituted by anything..

    What am I missing?

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    The question mark is a placeholder for a value expression in a SQL statement.
    You must use PDI variables (textual substitution) for identifiers.
    So long, and thanks for all the fish.

  3. #3

    Default

    Do I set the variable value then in the first transformation? I cannot define it in the transformation itself (where the SQL is executed)

    edit :

    Found solution:

    The main job consist of :
    1. Transformation that reads the information_schema view and send output to result
    2. Second component is a subjob that contains:
    a. transformation that sets the variable and sets output to result
    b. transformation that performs the insert with the variable (builds the SQL using the variable
    The subjob is set to fire with every input row from the first transformation

    The SQL reads as:

    insert into reg_frequencies
    select
    '${column}',
    ${column},count(pk_id)
    from frequency group by ${column}

    with settings:

    - execute as single statement
    - variable substitution

    The insert statement writes the column_name, the value and the calculated count to the frequencies table

    It is also amazingly fast..
    Last edited by blom0344; 12-01-2013 at 12:30 PM. Reason: solution found

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.