Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Uncompressing bzip files

  1. #1
    Join Date
    Mar 2014
    Posts
    181

    Default Uncompressing bzip files

    Hi all,

    I have a compressed zip file which outputs a folder when unzipped manually and this folder contains man bzip files which I am required to unzip.

    Below is the flow:

    Compressed zip file -> folder name(when compressed) - > so many bzip files (Around 18 in number)

    I am required to implement a transformation that unzips all these files before they are loaded in postgres database.

    I need some ideas on how to implement this transformation to do the extraction before the data is staged in a database since the unzip job might not be helpful.

    Thanks for your support.

    Thanks,

    Ron

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    I had to do something similar and I have used a "Execute a process" to point to a batch file which launch 7-zip.
    Therefore I would create a transformation with something like:
    - get zip files
    - execute process (easy if file names/locations are always the same)
    - text file input
    - table output
    -- Mick --

  3. #3
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    No need to decompress bzip files since Kettle supports them directly via Kettle VFS.
    So long, and thanks for all the fish.

  4. #4
    Join Date
    Mar 2014
    Posts
    181

    Default

    Quote Originally Posted by marabu View Post
    No need to decompress bzip files since Kettle supports them directly via Kettle VFS.
    Hello Marabu,

    I went through the VFS documentation and did as instraucted using the text file input step but I still do not get since the transformation does not work.

    I have attached the transformation and some screenshots for you to look at.

    https://www.dropbox.com/s/iiua8d6ox0..._data.ktr?dl=0
    https://www.dropbox.com/s/o3qvydhit2...0file.PNG?dl=0
    https://www.dropbox.com/s/bx2jatqh5k..._file.PNG?dl=0

    I meantime I created a shell script job to do the unzipping and read the files but incase VFS can be used, then it will save me some processing.

    time.

    Thanks for your support.

  5. #5
    Join Date
    Mar 2014
    Posts
    181

    Default

    Mick, this looks like a visible solution for now.

    Thank you,

    Ron

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    With Kettle VFS try to use the correct prefix ( bz2: not gz: )
    And don't use DOS wildcards where regular expressions are expected (.*\.txt\.bz2)
    Better luck next time
    Last edited by marabu; 11-25-2014 at 01:07 PM.
    So long, and thanks for all the fish.

  7. #7
    Join Date
    Mar 2014
    Posts
    181

    Default

    Thanks for the correction:

    This is what I specified in the File Directory within the Text File Input step:

    bz2:file:///C:/Installs/internal_files/90341216_DigitalData_20141123.zip/90341216_DigitalData_20141123

    When I click on show file names, I get the following error in the screen shot.

    https://www.dropbox.com/s/d3hzm9ieox...Error.PNG?dl=0

    Thanks,

    Ron

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I'm just realising that you try to process bzip files still inside a zip file.
    I don't think this will ever work, since the outer prefix must indicate a directory if you want to use a wildcard regex.
    An archive like zip or jar is acceptable, bz2 indicating a compressed file simply doesn't qualify for this.
    So why don't you just use the unzip job entry to create a directory with your bzipped files first?
    In a subsequent transformation you can fetch the list of filenames then and prepend the bz2: prefix.
    Finally let Text File Input accept the filename from a field and you are almost home.
    So long, and thanks for all the fish.

  9. #9
    Join Date
    Apr 2008
    Posts
    4,689

    Default

    Just a thought....

    What about a Get File Names step: zip:file://path/to/zip wildcard .*\.bz2
    Then the file input would be bz2:zip:file://path/to/zip/!bz2/file.txt ?

  10. #10
    Join Date
    Mar 2014
    Posts
    181

    Default

    Quote Originally Posted by marabu View Post
    I'm just realising that you try to process bzip files still inside a zip file.
    I don't think this will ever work, since the outer prefix must indicate a directory if you want to use a wildcard regex.
    An archive like zip or jar is acceptable, bz2 indicating a compressed file simply doesn't qualify for this.
    So why don't you just use the unzip job entry to create a directory with your bzipped files first?
    In a subsequent transformation you can fetch the list of filenames then and prepend the bz2: prefix.
    Finally let Text File Input accept the filename from a field and you are almost home.

    Hi Marabu,

    I have tried to unzip the file using the unzip file step but I am getting an error which I don't really understand even when I enable Debugging to
    get detailed error messages.

    This is the error, 2014/11/25 14:24:54 - Unzip file 2 - ERROR (version 5.0.1-stable, build 1 from 2013-11-15_16-08-58 by buildguy) : Error

    I have also attached my transformation on dropbox.

    https://www.dropbox.com/s/a5lk1xkf96...files.kjb?dl=0

    Thanks,

    Ron

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.