PDA

View Full Version : TextFileInput and Regular Expressions



kettle_anonymous
12-22-2005, 07:28 AM
Hi guys,


may somebody could help me a bit out:



I'd like to read a bunch of text files from a directory, where each text file contains the same datastructure but has slightly different names. Let's say the files have names like that:



FILE_A_123.txt
FILE_A_456.txt
FILE_A_789.txt



I tested a regex like



FILE\_A\_\d+\.txt



but it never works, even if i put only * just for testing i won't work.



Could someone give me a hint how to set the parameters in this case ...



Thanks in advance,
Bastian

MattCasters
12-22-2005, 08:45 AM
Yeah, I've been planning to make this feature easier to use, but the solution is indeed a regular expression:

Directory: /Some/Directory
Possible Wildcard expressions:

FILE_A_.*.txt
FILE_[ABC]_[0-9][0-9][0-9]\.txt

Meaning:
. Any character
\. dot
* previous character repeated one or more times
[0-9] 0 until 9
[ABC] A, B or C

All the best.

Matt

bger77
12-22-2005, 08:53 AM
Thanks for your really fast reaction!

Meanwhile I played around with a little java file myself and found the problem just this minute.
I must admit that I'm really used to perl's regex style - and that simply differs a little from java's regex style.

So thank you a lot!

BR
Bastian

MattCasters
12-22-2005, 09:48 AM
No problem!

I don't know about Perl, but all other Unix tools basically work like java regexp: vi, sed, awk, ...
Maybe there are small differences, but for me so far it has worked great.
I onced loaded several *thousand* text files in one go. (don't ask)

Cheers,

Matt

dangerahead
02-04-2006, 06:09 PM
What about allowing the Text Field Input to take input from a "variable" value instead of a wildcard?

so we can use Javascript to determine the "file path" and "file name" strings.

Doing the same with the "Text File Output" means we can use the same variable and
create loops to job through multiple text files

If it's possible to do this already can you please tell me how?

I need to convert 1000 text files to 1000 different text files for a render engine to work.

MattCasters
02-05-2006, 12:02 AM
Please see this reply:
HOW: multiple text files output to multiple text files? (http://forums.pentaho.org/showthread.php?t=48279)



Thanks,



Matt

mlu
01-30-2007, 04:14 AM
Directory: /Some/Directory
Possible Wildcard expressions:

FILE_A_.*.txt
FILE_[ABC]_[0-9][0-9][0-9]\.txt

Meaning:
. Any character
\. dot
* previous character repeated one or more times
[0-9] 0 until 9
[ABC] A, B or C



The wildcard expressions work well for files. But if I want to set variables for a directory, it didn't work! I think the problem is that the \ isn't allowed!

Attachment:
text-file_input.jpg show the input step with the variables
error.jpg show the error message which appears by previewing the transformation

Can somebody tell me, how I can use variables in the path?

Thanks.

sboden
01-30-2007, 07:30 AM
Problem is that wildcards are allowed for files, not for directories. And for directories it's not going to be easy (read "hard to impossible") to support, the APIs don't support it: you open a directory and then you read the list of files in there, you can't open a "wildcarded" directory.

You can use variables for directories, but not wildcards, meaning you can make a job that gets its directory to use from the environment but it will always still read from only 1 directory, which you can then change between executions.

What you seem to want to do is use a regular path and then try to squeeze in the last parts of the path in with the filename using wildcards, that's not the way it works.

Regards,
Sven