Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Unzip file - Add extracted file to result

  1. #1

    Default Unzip file - Add extracted file to result

    I saw a few people asking questions in the forum around this but no answers. I want to unzip a few files based on a regex, and then in the following step I want to load those files in database. I have a job that does unzipping and I have "Add extracted file to result" checked. Then I call the transformation to load the extracted files. First step of my transformation is "get rows from results" and second step is "text file input" step. I have checked "Accept filenames from previous step" but it is not working. Has anyone made this work in the past? Would appreciate any help. How is the extracted file names passed to further steps.
    Pentaho wiki says "add the extracted file names to the list of result files of this job entry for use in the next job entries" but that doesn't seem sufficient to make this work. Can anyone suggest any further detailed documentation/book for all these features or I am out of luck? Thanks.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Data Results and File Results are two different lists, which are unfortunately named similarly.

    You would need to "Get Files from Results" rather than "Get Row from Results" However this doesn't loop through files individually either.
    Do the files need to be run one at a time, or can they be processed as a group?

    If they can be done as a group, you can use VFS and wildcards...

  3. #3

    Default

    EDIT: So I got this working finally. Thanks a lot gutlez for your help. The first step in transformation is indeed "get files from results" and then use "path" variable for setting the filename in "text file input step". Adding a screenshot for others' reference and also updating the attached transformation with correct code.

    Thanks for the reply. The files can be processed as a group and that's how I am hoping to do it (giving wildcard/regex in unzip job and then loading all unzipped files in one shot using results).
    I changed the "get row to results" into "get files from results" as you suggested. Still not working for me. It says filename field cannot be null. I am not sure how to pass a filename field since Get files from results doesn't let me specify a field name. Attaching my sample Transformation and Job in case it helps in troubleshooting. Thanks for the help.
    Attached Files Attached Files
    Last edited by Inder; 08-27-2012 at 06:57 PM.

  4. #4

    Default

    Quote Originally Posted by Inder View Post
    EDIT: So I got this working finally. Thanks a lot gutlez for your help. The first step in transformation is indeed "get files from results" and then use "path" variable for setting the filename in "text file input step". Adding a screenshot for others' reference and also updating the attached transformation with correct code.

    Thanks for the reply. The files can be processed as a group and that's how I am hoping to do it (giving wildcard/regex in unzip job and then loading all unzipped files in one shot using results).
    I changed the "get row to results" into "get files from results" as you suggested. Still not working for me. It says filename field cannot be null. I am not sure how to pass a filename field since Get files from results doesn't let me specify a field name. Attaching my sample Transformation and Job in case it helps in troubleshooting. Thanks for the help.
    Trying again with the screenshot...
    Name:  forum_0827.jpg
Views: 504
Size:  25.9 KB

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Another possibility is to use the ZIP:FILE:// VFS option. You wouldn't need to do it in two transforms that way.
    I'm just not 100% sure of the syntax, and how to get it to do multiple zip files. Multiple files in one zip is explained in the documentation.

  6. #6

    Default

    Quote Originally Posted by gutlez View Post
    Another possibility is to use the ZIP:FILE:// VFS option. You wouldn't need to do it in two transforms that way.
    I'm just not 100% sure of the syntax, and how to get it to do multiple zip files. Multiple files in one zip is explained in the documentation.
    Cool, I'll definitely explore that more. I need to count the number of lines in each file for validation - so no way getting around unzipping files for now (unless there's a way to count records without unzipping as well . But I hope I have a use case soon where I load zipped files directly. Thanks for pointing out the feature.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.