Hitachi Vantara Pentaho Community Forums
Results 1 to 1 of 1

Thread: Reading multiple tar.gz files

  1. #1

    Default Reading multiple tar.gz files

    I am looking to read the contents of multiple tar.gz files. My first approach was to try to unzip the files via the "Unzip file" step but that did not work. After reading through forum posts (http://forums.pentaho.com/showthread...-Gzipped-files) I have a better understanding why.

    Based on that I read through the examples provided in the Advanced Users FAQ (http://wiki.pentaho.com/display/EAI/...ressedfiles%3F) and was able to successfully read one of the many files in a tar.gz files that I have via the "Text file input" step. Now I am looking to see how I might be able to access multiple tar.gz files that have the same files within them. Is this possible?

    For example I might have the following source tar.gz files that I need to read:

    bla_2014-12-15.tar.gz
    bla_2014-12-14.tar.gz
    foo_2014-12-15.tar.gz
    foo_2014-12-14.tar.gz

    Within each of these tar.gz files there is a bar.txt file. I tried reading the files from the "Text file input" step with wildcards like so...

    tar:gz:/path/to/files/.*.tar.gz!/.*.tar!/bar.txt

    ...but I was not able to get this to work.

    As mentioned above I am able to read the bar.txt file if I specify the full path like so (tar:gz:/path/to/files/bla_2014-12-15.tar.gz!/bla_2014-12-15.tar!/bar.txt), but I need to be able to handle the probability of multiple tar.gz files. So I tried another approach of creating the full vfs string using "Get File Names" and a few other steps. I am able to build the full vfs string correctly and then pass it into the "Text file input" step with "Accept file names from previous step" turned on. When I run this though it fails saying it can't open the following file:

    tar:gz:file:////path/to/files/bla_2014-12-15.tar.gz!/bla_2014-12-15.tar!/bar.txt

    It looks like the "file://" is causing the issue. Is there a way to tell it to not include "file://"? Does anyone have any ideas of how to get around all of this? Or a better way to handle this situation?

    EDIT: I also want to mention this is being done on a Windows computer with a Windows server running the Pentaho process in mind. I know I could install gzip packages to both but I'm trying to avoid that path if possible.
    Last edited by simon; 12-19-2014 at 01:43 PM. Reason: additional info
    -Simon
    Pentaho Version: 5.2.0.0

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.