Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Need to identify filenames in a file with regexp and move them to HDFS

  1. #1

    Default Need to identify filenames in a file with regexp and move them to HDFS

    Hi ,

    I am relatively new to Pentaho. Here is what I supposed to do.

    I have a file that contains all the file names "Filenames.txt" in a directory. I have a config file that contains two fields. "REGEX" and "DIRECTORY" . I need to search the files in "Filenames.txt" with the "REGEX" if the file name is found I need to move the file to the corresponding "Directory" in the hdfs location. If the file is not found I need to move the file to the Error location located in a directory.

    Now what I did.

    Transformation 1 : (Name : FilenamesfromDirectory)

    To get the files names in to a file from the source directory. Used "Get File Names" step with Regular expression ".*\.*" and directed output to the "TextFileOutput" named it as Filenames.txt Now I have all the files in a directory in the FileNames.txt

    Transformation 2: (Name : Filenamesusingregex)

    I have taken the configfile.txt as a source to CSV file Input and have provided and taken the regex and target_directory and populated two variables.
    REGEX and TARGET_VARIABLES

    Now I have created with a job (JOB1) with transformation 1 and provided that as an input to "ADDROWSRESULTS" and
    created another job with JOB1 -> "write to file" step

    My intention is to get the filenames that matched the "refexp" and corresponding directory to which it has to be moved.

    Currently I am doing it in my local machine not on HDFS yet.

    I am sure there must be something I am missing.


    Thanks.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You didn't describe where exactly you failed, did you?
    BTW: Have you found the "Hadoop Copy Files" job entry yet?
    So long, and thanks for all the fish.

  3. #3

    Default

    Hi ,

    Thanks for the response. Sorry about that. I am getting the two required lines in the output files . But they are blank. I am supposed to get the file name and the corresponding directory in the output file. Yes, I did find the "Hadoop Copy Files" but initially I wanted to do it in my local as i mentioned.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Hint: If you attach what you did, we can try to correct it.
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.