question about Execute a Process
I have built a KJB which includes 1 Transform to get a series of files and a 2nd Transform to parse / process / load these files into DB. I added an "Execute a shell script" Step to my Job to run a .bat script to move the already-processed files to an archive directory. However, at Job level, this only runs once and therefore only moves one file. I need a way to archive processed files from within the Transform itself rather than "outside" in the job.
I have been trying to understand "Execute a process" Step but am lost. Do I use one of the output fields from my Table output Step as "Process field"? how do I tell Execute a process to run an external batch file? Thank you in advance for any help or guidance!
---> running Kettle CE 4x on Win7
I would alter your workflow slightly:
- Transform 1 (Get Files)
- Job 2 (Process File) - Execute for each row
- - Move Files (Source -> Processing)
- - Transform 2 (Parse / Process / Load File)
- - Move Files (Processing -> Archive)
You could use the Move Files directly after your Transform 2, however it won't be hit until Transform 2 runs successfully, where using the nested Job will show you exactly which files have been processed correctly, and which one had problems, leading to easier long term maintenance.
gutlez, you are saying create Job that includes existing Transform 1, then embed secondary Job that runs Transform 2. Have I got that right?
Also, my Source files are already "dropped" into "Processing" directory by an external process. I configured "Get File" Transform to include "File exists" checking. My intent is to schedule this to run hourly, look for files, if not found, stop, else return filenames. So, if I follow your recommendation, I should not need "Move files (Source --> Processing) Step, correct?
Correct on all counts!
If you are not worried about partial runs, then you could just use the move files step after your Transform 2, but it isn't as clear, and could be more trouble if you ever get more than one file.
- Transform 1 (existing)
- Transform 2 (existing)
- Move Files (Dir: Processing, Wildcard: .*\..* to Dir: Archived)
The issue I see with this setup is that Move Files should only happen when Transform 2 is successful - so if you have two files in the directory, and only one of them can be processed correctly, the good one will not be moved to Archived.
If you think of the over all work flow that you want to accomplish:
- Look in Processed folder for files
- For each file found,
- Process File
- Move File to Archived Directory
This workflow is more accurately modeled with a nested job.
Last edited by gutlez; 04-30-2012 at 03:36 PM.
OK, I have this working.... sort of. I don't seem to be getting the Move files Step configured correctly. My Job runs and correctly loads 2 (sample) data files into my DB, but Move files doew not actually move anything.
Any advice on how I need to configure this Step?
Which way did you configure it?
As a Nested job, or as a hanging step?