Hitachi Vantara Pentaho Community Forums
Results 1 to 14 of 14

Thread: Regex in Get File Name

  1. #1

    Default Regex in Get File Name

    Hi folks,
    I am trying to get a list of all the names of csv files in a folder so that I can write some regex to pull the specific files I want everyday for manipulation. Pretty simple. For testing I have a transformation with only the Get File Names step in it - it pulls the folder name and details when only the "File/Directory" is populated, but once I add something into the "Wildcard (RegsExp)" field, the transformation sits in an "Idle" status and doesn't run.

    I am trying to use .*\.csv as the regex in this field to pull names of all the csv files in this folder. From what I can tell I'm doing this correctly, but when I use regex, I get "Dispacting started for transformation" in the log, and the whole thing just sits there in "Idle" status. Take regex out, works OK, but doesn't give me what I want. Put it back in, I'm waiting around for nothing.

    I've tried restarting, creating a new transformation, passing the filepath and regex from a previous step, changing up the regex, but it's all the same . Is there another way to do what I'm trying to do, or am I doing something incorrect?
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  2. #2

    Default

    In some further investigation, this might be related more to me connecting to a shared network location rather than the Regex necessarily. I do have the right file path to return information on a folder or specific document, but it looks like whenever I try the Get File Names or Get SubFolder names step - it works fine with a local folder, but gets very slow with the network folders, and doesn't work at all with network +regex. Maybe that's an issue on my side with the network itself?
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by katieldouglass View Post
    more to me connecting to a shared network location
    Is it a UNC path (\\server\path) or a mapped drive (X:\ ) ? PDI doesn't 100% support UNC paths, but it usually behaves itself on mapped drives

  4. #4

    Default

    Yeah, it's a UNC path. Seems to work as long as I don't ask it anything more complicated than one folder or file at a time, but not past that.
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Technically, it's Apache Commons VFS, not PDI - just saying.
    Since CIFS is dismissed and UNC support is part of the FILE protocol we should have no troubles.

    I have no difficulties at all using RE (.+\.csv) in connection with UNC path (//server/path) over SMB 2.1 (WIN7 client, PDI 4.4, PDI 5.3).
    Will try with PDI 6.1 later, but I'm confident 6.1 won't let me down either.
    So long, and thanks for all the fish.

  6. #6

    Default

    Yep, seems like according to what I've found online I should be fine with a UNC path, but here I am :-(. I let one of the process run overnight just for giggles and its on 10 hours now - everything else finished in under 10 seconds except for the step referencing the shared drive. No errors, just still running. Maybe the problem isn't with pdi itself but with the network or something - any ideas?
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  7. #7
    Join Date
    Aug 2011
    Posts
    360

    Default

    How many files do you have in your folder?
    How did you write your filepath exactly?

  8. #8

    Default

    There are around 3500 files in the folder, it's a report drop off location, so I want to pull out specific files. The filepath is something like "\\\\companyname.bnm.net\\vendorname\\Reports\\My Learning" - I've tried a couple of variations in "Get File Names" and "Get SubFolder names". They work with local folders/files, but run forever with network ones. I actually created a workaround by creating a bat file that I can call via Pentaho, so my immediate need is solved thankfully :-)
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  9. #9
    Join Date
    Aug 2011
    Posts
    360

    Default

    Try writing the path with (single) slashs, not backslash.

  10. #10

    Default

    No luck, good idea though. Thanks though!
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  11. #11
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I'm still interested in the SMB version you are using ...
    So long, and thanks for all the fish.

  12. #12

    Default

    I'll try to figure it out, though I wasn't able to figure out anything from the link you provided below. We're running Pentaho as part of a business unit, so I'll ask someone in IT if they know.
    ---
    Data Integration version 6.1.0.1-198
    Report Designer version 6.1.0.1-196

  13. #13
    Join Date
    Dec 2016
    Posts
    1

    Default

    I am having the same issue related to the "Get Subfolder Names" Step. Works great while on local filesystem, but as soon as I try to connect via a mapped drive or a UNC the step just runs forever. I have also tried the "Get File Names" Step with the same results.

    Using Windows 10 (laptop) for testing this code.
    Pentaho 7
    Java 1.8.0
    Connecting to a CentOS 7 box which is where all the files are located.

    Does anyone have an ideas?

  14. #14
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by katieldouglass View Post
    I'll ask someone in IT if they know.
    This might help: Which version of SMB protocol are you using?
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.