Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Using regular expression to get proper files

  1. #1

    Default Using regular expression to get proper files

    Hello:

    I trying to grab a series of text files for parsing in a directory. The file names are in this format:

    .20111022_175756_623.log

    17:57:56:623 is the time of the first entry in the log. A new log is created when the old one reaches a certain size. Not surprisingly, the modified date of the file is about the same as the last record in the log file.

    In the Text File Input I specify these parameters:

    File/Directory: \\servername\sharename\logdir
    Wildcard(reg exp): ..*.log
    Exclude wildcard: .*otherlogtext.*.log (there are sometimes these other log files)

    My problem: I want to cut down on the amount of log files I'm reading at a time (5 or 6 days)--it creates too much data for me to deal with in the transformation process. How do I only read in say, the last day or two of logs? I could create a variable with data I have in my database but I would have to do some sort of "greater than" in the filename, and I'm not sure how to do that with regular expressions. Alternatively, is there some way for the text file input step to only get files with a modified date in a certain datetime range?

    I don't have to be exact, if I get a bit too much data that is dealt with later in the job.

    Thanks!

  2. #2
    Join Date
    Nov 2008
    Posts
    777

    Default

    The Get File Names step not only returns the filenames it also returns a lot of file metadata so you can filter out unwanted files based on modify datetime, file size, etc. At that point you can feed the remaining filenames into the Text File Input step. There's a checkbox there to accept filenames from a previous step.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.