Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Using wildcard to load multiple files in 'Text File Input' step

  1. #1

    Question Using wildcard to load multiple files in 'Text File Input' step

    I have mulitple files with different dates in the filename

    1. data_load_20070624.csv
    2. data_load_20070625.csv
    3. data_load_20070626.csv

    I want to load these all files using 'Text File Input' step into a database table.

    file1 + file2 + file3 ----> dbtable1

    I can 'Add' the files one by one into the 'Text File Input' step, by entering the filenames in 'File/Directory' field. The step accepts the filenames and load them into the database table.

    The problem is - I will not know in advance the number of files I get in the input. That means I need to use a 'wildcard' to add all the files into the 'Text File Input' step without specifying individual filenames.

    data_load_*${date}\.csv$

    I used the above wildcard expression in the 'Text File Input' step to get all files for loading into the database.

    It's not working and obviously I am not using the 'wildcard' expression as intend to be used.

    I am using Kettle 2.3.0 version on Windows platform.

    Appreciate if someone could help!!

    Thanks in advance.

  2. #2

    Default

    Try the wildcard as data_load_.*\.csv
    This will read data_load_*.csv files

    Regards
    Bijugv

  3. #3

    Default

    Thanks, it worked when I used the wildcard expression as indicated.

    Here's another issue I faced. When I try to use 'File Exists' transformation with wildcard it doesn't work. In the filename I entered "data_load_.*\.csv to check if files with the mentioned names exists or not.

    It is working for a single file e.g. data_load_20070626 but not for mulitple files e.g data_load_.*\.csv.

    I am on Kettle 2.3.0 on Windows.

    Any suggestions?

    Thanks,

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    wildcards are not built into every piece of functionality... fileexists only checks for a single file without wildcards.

    Regards,
    Sven

  5. #5

    Default

    Appreciate your quick reply.

    Since 'File Exists' step checks for a single file, if I want to check to see if a set of files exist i.e multiple files, the only option is to extract all the filenames manually and then check if they exist of not.

    A Java Script may do the job for me?

    I read somewhere in the forum that 'File Exists' may support wildcard in future versions, is that true?

    Poor me, we are still using 2.3.0

    Thanks.

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    There are other ways around it... use "get file names step", sort, group by, pass rows to result, ... and then in the job above use an eval to determine how many files there are... one of the "Weekly Tips" is similar in that it will check whether a query will have output before really starting the real extract.

    Regards,
    Sven

  7. #7

    Default

    Thanks for pointing me to the right direction! I am able to make it work with your help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.