Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Read file one by one and execute some steps depending if end of each file

  1. #1
    Join Date
    Jun 2016
    Posts
    181

    Default Read file one by one and execute some steps depending if end of each file

    Hi,

    I need to read files, insert content into temp table and then run another transformation that will run do some calculation.
    Problem is that there are too many rows in files so I need to insert content of only one file, then do this calculation, delete temp data (loaded from file) and repeat operation again (for each file).

    I imagine this as follow:

    GetFileNames -> TextFileInput -> TableOutput {then if end of file do the rest} -> ExecuteSQLscript -> Delete

    Problem is that I can know when file starts but I do not know when ends :-(
    Do you have some idea how can I achieve that?

    UPDATE:

    I see there is step called "Get Files Rows Count" so maybe if I compare "rowscount" with rownumer of each pass ....
    No, I will not work. Problem is that Pentaho runs all in paralel so even if thre's decision point the beginning of transformation will start anyway not waiting for ExecuteSQLscript
    Last edited by Gosforth; 12-09-2017 at 01:28 PM.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You want to process a SQL script "after" all rows of a file are processed, not "while".
    The natural way to do sequential processing in Kettle is a job.
    So, create a sub job with the filename as a parameter, containing a transformation (Text-File-Input => Table-Output) followed by an SQL job entry.
    In your main job you'll have a first transformation (Get-File-Names => Copy-Rows-To-Result), and your subjob iterating over the filenames with each filename copied to the parameter.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Jun 2016
    Posts
    181

    Default

    OK, I've created job:

    Transformation1[Get-File-Names => Copy-Rows-To-Result] => Job[Job[Transformation[Get-Rows-From-Result-Text-File-Input => Table-Output]]]


    Where I should place "Execute-SQL-script(delete data)"? This should be Job after transformation "Transformation[Get-Rows-From-Result-Text-File-Input => Table-Output]"?

    UPDATE:
    I did put it like that. It works. Thank you for help!


    Quote Originally Posted by marabu View Post
    You want to process a SQL script "after" all rows of a file are processed, not "while".
    The natural way to do sequential processing in Kettle is a job.
    So, create a sub job with the filename as a parameter, containing a transformation (Text-File-Input => Table-Output) followed by an SQL job entry.
    In your main job you'll have a first transformation (Get-File-Names => Copy-Rows-To-Result), and your subjob iterating over the filenames with each filename copied to the parameter.
    Last edited by Gosforth; 12-11-2017 at 03:09 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.