Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: multi-file processing

  1. #1

    Default multi-file processing

    I am still pretty new to Kettle so please bear with me... In an effort to better understand how to use 1 Transform to retrieve a list of files to be loaded, and another Transform to read that list as Inputs, I prepared the below Job. The job runs and indicates "Success" but is not doing what I intended.

    This is pretty simple: "get_PlacementRequest_files" Transform contains Get File Names and passes results to Copy rows to result. "get_files_list" Transform contains Get file from result and Text file output. I see my Output file written on my file system (hooray!) but it is empty. The Copy rows to result step in Transform #2 displays 5 entries. Can anyone help me resolve this? Thank you!

    Name:  KettleJob_capture.png
Views: 59
Size:  10.4 KB

  2. #2
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Quote Originally Posted by cnv_Ben View Post
    Transform contains Get file from result and Text file output. I see my Output file written on my file system (hooray!) but it is empty. The Copy
    Try the step "Get Rows from result" rather than "Get File from Result"

  3. #3

    Default multi-file processing

    The change to Get rows from result did not make any difference. Essentially what I want to do is (somehow) retrieve a list of files which I need to use as Input to an XML Input Stream step. My understanding is that I need to embed my Transform(s) in a Job. So, I need 1 Transform to prepare a list of files and a 2nd transform to read that list and send them 1-by-1 to XML Input Stream.

  4. #4
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Well, there's two different ways of looking at this:
    1) All rows of XML can be processed together (regardless of which file they were in)
    2) Each XML file must be processed independently

    The workflow you describe is needed if you are looking at option 2.

    If option 1 applies, you can build a RegEx in the filenames box of the "Get Data from XML" step and process them all together.

    Streaming XML Input and XML Input are depreciated in current versions of PDI...

    For your proof of workflow, you should be able to do the following:

    Job
    - Start
    - Transform 1
    - - Get File Names (do a preview on this step to make sure your RegEx wildcard is correct)
    - - Copy Rows to Result
    - Transform 2 (Optional: Set "Execute for each row")
    - - Get Row from Result (Make sure this is fully configured or your files will be empty!)
    - - Text File Output (If using execute for each, then set "Append" here for your proof-of-workflow, otherwise it will overwrite and you will get 1 filename in the output file.)


    *** Take a look at process flow step 1.kjb in your samples directory! It covers almost exactly what you are doing.
    Last edited by gutlez; 04-19-2012 at 04:10 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.