Hitachi Vantara Pentaho Community Forums
Results 1 to 22 of 22

Thread: reading all files in a folder

  1. #1
    Join Date
    Jun 2007
    Posts
    128

    Default reading all files in a folder

    Hi,

    I woulsd like to know whether there is any way in Kettle with which we can read data in all files in a directory.I would like to specify the name of a directory from which to read files instead of the textfile.
    Any help will be a great use to me.

    Thanks
    Sreelatha

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Just use regular expressions in the text file input step.

    Or a get filenames step linked to a text file input step (which has accept files from previous step switched on).

    The files need to contain data which is in the same format of course.

    Regards,
    Sven

  3. #3
    Join Date
    Jun 2007
    Posts
    128

    Default

    Hi,

    Thanks for the immediate response.Can you please elaborate on how can we do this I tried using Get filenames step.But I didn't understand how do I need to use it.

    Thanks
    Sreelatha

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    Forget get filenames for the moment.

    Use a text file input step. And instead of specifying 1 text file enter a wildcard for the filename ... e.g. .*txt means process all files ending on txt

    Regards,
    Sven

  5. #5
    Join Date
    Jun 2007
    Posts
    128

    Default

    Hi,

    Thanks a lot.I got it.
    I would like to know about clustering and partitioning.
    I have seen some examples of clustering .
    As per my understanding in clustering schema we are specifying the master and slave servers.
    So our process will be distributed among different slaves .is it so?
    Then what is database partitioning and how can we use clustering schema and database partitioning together.

    Thanks
    Sreelatha

  6. #6
    Join Date
    Jun 2007
    Posts
    138

    Default Reading all files

    Hi,Sven..

    I m trying the same thing....

    My Dir is, D:\kedar\discount\
    I want all Txt file: *.txt

    I am using "D:\kedar\discount\*.txt"

    I tried to give the same string into ,
    File Name Tab
    Regex Tab...

    N combined in both tab(like Path & Wild card)...


    But it is not able to find any such file.Wat shd be done...?
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  7. #7
    Join Date
    Jun 2007
    Posts
    128

    Default

    Hi,

    You should give the directory structure in the file/direcory and in the regex give .*txt instead of *.txt then add and use Get Filenames to see whther you are getting all the files .


    Thanks
    Sreelatha

  8. #8
    Join Date
    Jun 2007
    Posts
    138

    Default Working...

    Eureka...

    Its Working ....


    Once again,Sven..U r gr8...But May i know,why it didnt work with "*.txt"...!!?

    One more thing is,
    I got all the file names located in the directory.

    Now I want to process each file differently,based on their names.

    So I ll use javascript.

    Query,

    Pls tell me how can I do following....
    1. I want number of files...
    2. I want to process each of those files separately..So i can use Filter tool.(I can store each file in some string.)
    Last edited by kedar mehta; 09-20-2007 at 02:50 AM. Reason: a
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  9. #9
    Join Date
    May 2006
    Posts
    4,882

    Default

    What you enter as spec is a regular expression .*txt ... means any file ending on txt (. is a wildcard character, * means 0 or more times).

    To get the number of files there's a new step in 3.0... not in 2.5.1. If you want to process the files differently based on their names you have to do it differently:
    - First use "get filenames" step (works similar with wildcarding)
    - Put a filter behind it
    - Use a text input file behind it that receives it filenames from a hop.

    Regards,
    Sven

  10. #10
    Join Date
    Jun 2007
    Posts
    138

    Default error loading Spoon 3

    Thanx Sven..

    I already tried that...

    But,

    I am not able to start Spoon.bat of Kettle-3.0.0-M2 (Zip,downloaded from Pentaho site.)

    The error is,

    "Main class not found...Spoon will exit now"

    I m using windows xp
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  11. #11
    Join Date
    Jul 2007
    Posts
    247

    Default

    You will at least need Java 1.5

    Type java -version at command line to check which java version you are using.


    Regards,
    Ben

  12. #12
    Join Date
    Oct 2007
    Posts
    1

    Default Regular Expression Parser

    Sven,

    Since it seems like you are the guy in the know, exactly what kind of regexps can be parsed by the Get Filenames step? Perl, posix extended, etc.

    I could determine the answer by testing all the variants, but I figure it's smarter to ask the expert.

    Regards,
    Aaron

  13. #13
    Join Date
    May 2006
    Posts
    4,882

    Default

    lol ... the java regular expression kind ... One URL e.g. is http://www.wellho.net/regex/javare.html ... but just look on the internet for java regular expressions, you'll find tons. We try to reuse as much as functionality as is possible.

    Regards,
    Sven

  14. #14
    Join Date
    Jun 2007
    Posts
    138

    Default load all the files in the folder & process each as per their name

    I got few files in a folder.Numbers of file is not certain.I want to load all the files in that folder & process each separately,as per their names.

    Can I do it with kettle 2.5?How!!?

    I can load all the files by giving input file name as .*txt (regex) & now as per their names,I want to apply filtering..But how can I do it,when the number of files is not fix...
    Yes cases are fix..Like filtering cretira ,is fix.

    Thanx in ad...
    Last edited by kedar mehta; 01-07-2008 at 10:17 AM.
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  15. #15
    Join Date
    Jun 2007
    Posts
    138

    Default process a file as per its name

    being more specific,
    I have single file.I am loading file from file input step.
    I want to take some decision based on the file name.

    1st decisoin is based on First char of file name,
    then other characters even classify other things...

    Flow will look like,

    file input-get file name-->
    (which outputs exact file name in a varible)
    -->filter
    (which takes file name as input..)
    n as per that else part of filter will be executed.

    Please guide me how can i use filter here?
    = ,
    isnull,
    Regex ,
    etc

    are thr.I shd use Regex..But i dont knw,how..Caz it does not allow me to give the pattern .
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  16. #16
    Join Date
    May 2006
    Posts
    4,882

    Default

    Switch on the include filename in a text input file step and becomes a regular field with which you can do whatever you want.

    Regards,
    Sven

  17. #17
    Join Date
    Jun 2007
    Posts
    138

    Default Multiple files

    Thnx Sven...

    I got it for a single file..

    But Is it possible for multiple files,Espcly when the number of input file is not fix?

    Same story ,for each file in the folder being loaded I want to process them all as per their names..
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  18. #18
    Join Date
    May 2006
    Posts
    4,882

    Default

    3.0M2 is ancient... use 3.0.1 for now.

    You have to unzip it right (keeping folder structure) and you need at least JDK 5 (preferably the Sun one).

    Regards,
    Sven

  19. #19
    Join Date
    Nov 2006
    Posts
    4

    Default

    Quote Originally Posted by sboden View Post
    What you enter as spec is a regular expression .*txt ... means any file ending on txt (. is a wildcard character, * means 0 or more times).

    ...
    Hi Sboden,
    I've been working with regex in perl under *nix platforms for a long time. Do you know why the "*\.txt" regex doesn't work?

    Im in a NT box, maybe that's the reason :-/

    Anyway, I tried the .*txt and it worked so thanks anyway!

    Guillermo.

  20. #20
    Join Date
    May 2006
    Posts
    4,882

    Default

    In unix normally it would be .*\.txt ... but it would also depend in which application, which OS, ... Something like http://www.tcl.tk/man/tcl8.5/tutorial/Tcl20.html e.g.

    *\.txt works e.g. in 'ls' on a *nix box but that's shell expansion, not regular expressions

    Regards,
    Sven
    Last edited by sboden; 03-11-2008 at 03:14 PM.

  21. #21
    Join Date
    Apr 2008
    Posts
    9

    Default Zipped files

    Hi All

    I am retrieving a list of zipped files from a directory using the Get File Names component, I then want to pass the file names to the Unzip at job level. I have managed to get the file names back into the job however I cannot see a way of getting the unzip to take a list or parse a list of files or even a directory. Is this possible?

    Thanks
    Kimy

  22. #22
    Join Date
    May 2006
    Posts
    4,882

    Default

    No........

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.