Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Get one or more files based on logic from FTP site:

  1. #1

    Smile Get one or more files based on logic from FTP site:

    Hello everyone and thanks for looking. (1st post here)

    Not sure if I need to write a custom Transformation or Job to pull this off, but here is the problem:

    We need a way to look in a folder on an FTP site to determine the latest folder and then a way to get the latest file within that folder. We have some java code to perform this, however it would be better for us to try to stick to using Kettle as much as possible since we are evaluating it and trying to convince our company that Kettle is the way to go.

    The name of the folders represent the date in a yyyy_mm_dd format: i.e. 2007_10_01 and the filenames within each folder are also coded with a similar format FileType_yyyy_mm_dd_hh_mm_ss_nn i.e.

    What is the best way to do this? I am leaning towards writing a custom Transformation which gets one or more of these files (since we have 3 parent folders to look thru) and create a Job to wrapper around it.

    Another issue, can the [Get a file with FTP] Job component be parameterized somehow to get the file or should the suggested custom Transformation plugin be responsible for downloading as well? We are trying to use the Kettle framework as much as possible so by allowing Kettle to do the heavy lifting will help sell this product to our team.

    Thanks in advance and good day!

    -Marc Pike
    Professional Services Group
    ROME Corporation
    +1 713-965-0505 | office
    +1 713-965-9567 | fax
    +1 713-292-4278 | cell

    Turning Risk into Opportunity

  2. #2
    Join Date
    Sep 2006


    Since version 2.5, you can use the VFS on the "Get File Name" step to access file/directory on a ftp site directly. So it should be easy to determine the latest file with a couple of simple "Get File Name" steps.

  3. #3


    Thanks, I just figured this out 2 mins before this post, man that makes life so much easier. <double-pumping fists>

  4. #4


    Hmmmm, is it possible that some FTP sites do not support VFS? I can get to the files and folders as long as I go at it any other way but I am unable to connect via VFS:

    Examples: /MASTER/GISFEntity_XML/2007_11_01 /MASTER/GISFEntity_XML/2007_11_01 /MASTER/GISFEntity_XML/2007_11_01/

    I am trying to get all the zip files but it seems to keep failing in Kettle and in Windows Explorer, however I can connect to the root using Windows Explorer as long as I provide the user/pass and it seems to work fine.

    Damn, this would be so much easier if I had VFS working...

    Thanks in advance, any help appreciated.
    Last edited by MattCasters; 11-01-2007 at 01:26 PM.

  5. #5
    Join Date
    Nov 1999


    No, all FTP sites support it.
    I do detect a space smack in the middle of the URLs.

    Also verify the case of the zip file.
    Perhaps the wilcard needs to be
    or something similar


  6. #6


    Thanks Matt, got it working!

  7. #7


    Spoke too soon I guess, how do I parameterize the VFS string?

    I have loaded into variables from a previous (Transformation) the FTP site, user, pass and abstract path of the folder that contains the folders that I am looking for.

    I would prefer to not hard-code this, but I am not sure how to parameterize it.

    Any help is appreciated guys/gals.

  8. #8
    Join Date
    Nov 1999


    There are input fields in the GUI that have a gray-red icon to the right like this <$>.
    In all those fields you can use variables.
    You can opt to either:
    • put fixed variables in $HOME/.kettle/ with format VARIABLE = value
    • define extra JVM environment variables in the shell/bat scripts using option -DVARIABLE=value
    • define the variables dynamically in the first transformation job entry of a job using the "Set Variables" step. Note that you can't set and re-use variables in the same transformation. (all steps run in parallel)

    All the best,


  9. #9


    Thanks Matt, your answer is what my question was that has been on my mind since I left work yesterday.

    In my main job I have a Load S&P Properties transformation that gets the ftpSite, userName, userPwd, masterFolder, etc. using a Table Input.

    I then set use a Copy rows to result then I do a Set Variables.

    In the next Transformation I am trying to make use of these variables.

    Should I build a new variable to contain the entire VFS value or can I do something like the following:


    My guess is that this is not possible, since it does not seem to work. Also, I am not seeing my variables in that the Transformation when I hit CTRL+SPACEBAR, should I?

    Thanks Matt, you have been so helpful!


    Marc Pike

  10. #10


    Another Issue: I need to call Get File Names to retrieve folders and files, it does not seem to support variables, any ideas on how to get them?

    My goal is to get only files after a certain lastDownloadDate, using the filename structure to determine the date.

    Thanks again.

  11. #11
    Join Date
    Nov 1999



    If you don't see your variables because they get defined at runtime, you can always define them using the Edit/Set variables menu. That way you can test if they work.
    You can indeed use as many variables as you like in a field and you can mix it with other content.



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.