Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Setting variable to run only certain number of rows

  1. #1
    Join Date
    Jan 2009

    Default Setting variable to run only certain number of rows


    I run the kitchen jobs from unix shell scripts. Since the database that we use Kettle against is very huge it takes a long time to finish the job and find out if there is an error in 1 of the transformations in a job. I know we can preview any transformation before we kick off the job to find out if there are any errors but preview is slow and manual and I am trying to automate this. So basically I want to set some variable somewhere such that when I call this kitchen job in shell script it only runs for the first n rows. Is this a possiblilty in Kettle?


  2. #2
    DEinspanjer Guest


    If you are using Text File Input steps, you could try modifying the XML to set the limit element to the number of rows you want to test with. If you are using Table Input steps, you could try adding a variable to the bottom of your SQL that is empty normally but could be set to something like "LIMIT 10".

    The only other thing I can think of that might work but feels dangerous so it would require a lot of testing would be to have a JS step that counted the number of rows that went through it and when it decided enough rows had been processed, it could try to invoke the stopAll() method on the step class to signal the transformation to halt without flagging it as an error.

  3. #3


    I think a limit in sql is sufficient. or you might use a small size data source for testing purpose.

    I think preview helps a lot, though it costs some time.

  4. #4
    Join Date
    Jan 2009


    Thanks for replying but both of the suggested options need me to modify the SQL query just for testing purpose and that would mean changing all the transformations which I want to test to limit the number of rows which otherwise don't need a limit. Is there an option in Kettle that I can set from the command line when I call the kitchen job that would limit the number of rows in the result set?

    Like in the following example when I call the kitchen job J_MGS_RAW is there any other variable that I can add that would help me acheive this limit on number of rows dynamically?

    Code: -user=$USERNAME -pass=$PASSWORD -rep=$REPOSITORY -job=$JNAME -dir=/J_Release_MGS -level=$LOGLEVEL 'J_MGS_RAW' 'MGS
    ' > ${LOG_DIR}/${LOGFILE}

  5. #5
    DEinspanjer Guest


    You can have a variable in your SQL field that is set to an empty string during production but can be set to a LIMIT clause when you want to do testing.

    SELECT *
    FROM foo
    WHERE blah

    If DEBUG_LIMIT is not set when the transformation runs then it is an empty string.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.