Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: List or array data structure

  1. #1

    Default List or array data structure

    Hi,

    I am having a job that processes xml files and load them into stage and fact tables. In the first transformation of the job I am generating a list of files that should be processed. At the end of the job I would like to move those files into another location so I am looking for some kind of global data structure that would be populated in the first transformation and could be referenced in the last. Filenames are currently inserted into the Pentaho resultSet.

    Name:  000309.jpg
Views: 47
Size:  11.9 KB

    Thank you.

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    There is a job step for copying/moving files:
    http://wiki.pentaho.com/display/EAI/Copy+Files
    -- Mick --

  3. #3

    Default

    If I use this component on the source folder it can happen that I will move files that arrive meanwhile executing the job. Because of that I was thinking to first make a list of file that are currently being processed and after a successful execution move them to another directory.

    Maybe a better way would be to first move files to another temp directory and at the end move files from whole directory to the final one.

    Currently I have solved the problem by writing those files in a temp file from which I read and move the file at the end of the job.

  4. #4
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Or, another option would have been to pass one file at the time through your job+transformations and move it at the end.
    -- Mick --

  5. #5

    Default

    Yes, it would be a good idea but in my case there is a mix of iterative (each file has to be individually loaded into stage) and batch load (load fact from stage) so it is quite tricky to do it properly.

    My first though was about writing the whole file list into a variable and parse it in the last step but I have encountered a strange error when running the job multiple times. In this case getVariable("list","") was taken from previous job run despite setting setVariable("list","r"). At the end I solved the issue with writing the file list to a temp file and handle the move operation at the end.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.