Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: TextFileInput and Regular Expressions

  1. #1
    Join Date
    Sep 2005
    Posts
    1,403

    Default TextFileInput and Regular Expressions

    Hi guys,


    may somebody could help me a bit out:



    I'd like to read a bunch of text files from a directory, where each text file contains the same datastructure but has slightly different names. Let's say the files have names like that:



    FILE_A_123.txt
    FILE_A_456.txt
    FILE_A_789.txt



    I tested a regex like



    FILE\_A\_\d+\.txt



    but it never works, even if i put only * just for testing i won't work.



    Could someone give me a hint how to set the parameters in this case ...



    Thanks in advance,
    Bastian

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: TextFileInput and Regular Expressions

    Yeah, I've been planning to make this feature easier to use, but the solution is indeed a regular expression:

    Directory: /Some/Directory
    Possible Wildcard expressions:

    FILE_A_.*.txt
    FILE_[ABC]_[0-9][0-9][0-9]\.txt

    Meaning:
    . Any character
    \. dot
    * previous character repeated one or more times
    [0-9] 0 until 9
    [ABC] A, B or C

    All the best.

    Matt

  3. #3
    Join Date
    Dec 2005
    Posts
    1

    Default RE: TextFileInput and Regular Expressions

    Thanks for your really fast reaction!

    Meanwhile I played around with a little java file myself and found the problem just this minute.
    I must admit that I'm really used to perl's regex style - and that simply differs a little from java's regex style.

    So thank you a lot!

    BR
    Bastian

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: TextFileInput and Regular Expressions

    No problem!

    I don't know about Perl, but all other Unix tools basically work like java regexp: vi, sed, awk, ...
    Maybe there are small differences, but for me so far it has worked great.
    I onced loaded several *thousand* text files in one go. (don't ask)

    Cheers,

    Matt

  5. #5
    Join Date
    Feb 2006
    Posts
    3

    Default RE: TextFileInput and Regular Expressions

    What about allowing the Text Field Input to take input from a "variable" value instead of a wildcard?

    so we can use Javascript to determine the "file path" and "file name" strings.

    Doing the same with the "Text File Output" means we can use the same variable and
    create loops to job through multiple text files

    If it's possible to do this already can you please tell me how?

    I need to convert 1000 text files to 1000 different text files for a render engine to work.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: TextFileInput and Regular Expressions

    Please see this reply:
    HOW: multiple text files output to multiple text files?



    Thanks,



    Matt

  7. #7

    Question How can I use variables in a the path?

    Quote Originally Posted by MattCasters View Post
    Directory: /Some/Directory
    Possible Wildcard expressions:

    FILE_A_.*.txt
    FILE_[ABC]_[0-9][0-9][0-9]\.txt

    Meaning:
    . Any character
    \. dot
    * previous character repeated one or more times
    [0-9] 0 until 9
    [ABC] A, B or C
    The wildcard expressions work well for files. But if I want to set variables for a directory, it didn't work! I think the problem is that the \ isn't allowed!

    Attachment:
    text-file_input.jpg show the input step with the variables
    error.jpg show the error message which appears by previewing the transformation

    Can somebody tell me, how I can use variables in the path?

    Thanks.
    Attached Images Attached Images   

  8. #8
    Join Date
    May 2006
    Posts
    4,882

    Default

    Problem is that wildcards are allowed for files, not for directories. And for directories it's not going to be easy (read "hard to impossible") to support, the APIs don't support it: you open a directory and then you read the list of files in there, you can't open a "wildcarded" directory.

    You can use variables for directories, but not wildcards, meaning you can make a job that gets its directory to use from the environment but it will always still read from only 1 directory, which you can then change between executions.

    What you seem to want to do is use a regular path and then try to squeeze in the last parts of the path in with the filename using wildcards, that's not the way it works.

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.