Hitachi Vantara Pentaho Community Forums
Results 1 to 19 of 19

Thread: Uncompressing bzip files

  1. #1
    Join Date
    Mar 2014
    Posts
    181

    Default Uncompressing bzip files

    Hi all,

    I have a compressed zip file which outputs a folder when unzipped manually and this folder contains man bzip files which I am required to unzip.

    Below is the flow:

    Compressed zip file -> folder name(when compressed) - > so many bzip files (Around 18 in number)

    I am required to implement a transformation that unzips all these files before they are loaded in postgres database.

    I need some ideas on how to implement this transformation to do the extraction before the data is staged in a database since the unzip job might not be helpful.

    Thanks for your support.

    Thanks,

    Ron

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    I had to do something similar and I have used a "Execute a process" to point to a batch file which launch 7-zip.
    Therefore I would create a transformation with something like:
    - get zip files
    - execute process (easy if file names/locations are always the same)
    - text file input
    - table output
    -- Mick --

  3. #3
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    No need to decompress bzip files since Kettle supports them directly via Kettle VFS.
    So long, and thanks for all the fish.

  4. #4
    Join Date
    Mar 2014
    Posts
    181

    Default

    Quote Originally Posted by marabu View Post
    No need to decompress bzip files since Kettle supports them directly via Kettle VFS.
    Hello Marabu,

    I went through the VFS documentation and did as instraucted using the text file input step but I still do not get since the transformation does not work.

    I have attached the transformation and some screenshots for you to look at.

    https://www.dropbox.com/s/iiua8d6ox0..._data.ktr?dl=0
    https://www.dropbox.com/s/o3qvydhit2...0file.PNG?dl=0
    https://www.dropbox.com/s/bx2jatqh5k..._file.PNG?dl=0

    I meantime I created a shell script job to do the unzipping and read the files but incase VFS can be used, then it will save me some processing.

    time.

    Thanks for your support.

  5. #5
    Join Date
    Mar 2014
    Posts
    181

    Default

    Mick, this looks like a visible solution for now.

    Thank you,

    Ron

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    With Kettle VFS try to use the correct prefix ( bz2: not gz: )
    And don't use DOS wildcards where regular expressions are expected (.*\.txt\.bz2)
    Better luck next time
    Last edited by marabu; 11-25-2014 at 01:07 PM.
    So long, and thanks for all the fish.

  7. #7
    Join Date
    Mar 2014
    Posts
    181

    Default

    Thanks for the correction:

    This is what I specified in the File Directory within the Text File Input step:

    bz2:file:///C:/Installs/internal_files/90341216_DigitalData_20141123.zip/90341216_DigitalData_20141123

    When I click on show file names, I get the following error in the screen shot.

    https://www.dropbox.com/s/d3hzm9ieox...Error.PNG?dl=0

    Thanks,

    Ron

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I'm just realising that you try to process bzip files still inside a zip file.
    I don't think this will ever work, since the outer prefix must indicate a directory if you want to use a wildcard regex.
    An archive like zip or jar is acceptable, bz2 indicating a compressed file simply doesn't qualify for this.
    So why don't you just use the unzip job entry to create a directory with your bzipped files first?
    In a subsequent transformation you can fetch the list of filenames then and prepend the bz2: prefix.
    Finally let Text File Input accept the filename from a field and you are almost home.
    So long, and thanks for all the fish.

  9. #9
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Just a thought....

    What about a Get File Names step: zip:file://path/to/zip wildcard .*\.bz2
    Then the file input would be bz2:zip:file://path/to/zip/!bz2/file.txt ?

  10. #10
    Join Date
    Mar 2014
    Posts
    181

    Default

    Quote Originally Posted by marabu View Post
    I'm just realising that you try to process bzip files still inside a zip file.
    I don't think this will ever work, since the outer prefix must indicate a directory if you want to use a wildcard regex.
    An archive like zip or jar is acceptable, bz2 indicating a compressed file simply doesn't qualify for this.
    So why don't you just use the unzip job entry to create a directory with your bzipped files first?
    In a subsequent transformation you can fetch the list of filenames then and prepend the bz2: prefix.
    Finally let Text File Input accept the filename from a field and you are almost home.

    Hi Marabu,

    I have tried to unzip the file using the unzip file step but I am getting an error which I don't really understand even when I enable Debugging to
    get detailed error messages.

    This is the error, 2014/11/25 14:24:54 - Unzip file 2 - ERROR (version 5.0.1-stable, build 1 from 2013-11-15_16-08-58 by buildguy) : Error

    I have also attached my transformation on dropbox.

    https://www.dropbox.com/s/a5lk1xkf96...files.kjb?dl=0

    Thanks,

    Ron

  11. #11
    Join Date
    Mar 2014
    Posts
    181

    Default

    Hello Gutlez,

    I tried this option but did not work for me either.

    Thanks,

    Ron

  12. #12
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Can you attach a really simple example file (of the Zip containing BZip files)?

  13. #13
    Join Date
    Mar 2014
    Posts
    181

    Default

    Yes, I did

    It is right here.

    https://www.dropbox.com/s/3vqrpw6nab...41123.zip?dl=0

    Thanks,

    Ron

  14. #14
    Join Date
    Mar 2014
    Posts
    181

    Default

    Please let me know once you have the file. i need to remove it.

  15. #15
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by Ron256 View Post
    Please let me know once you have the file. i need to remove it.
    Got it.

  16. #16
    Join Date
    Mar 2014
    Posts
    181

    Default

    Thanks!

  17. #17
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I must say there is something about that zip file that makes Kettle choke.
    While fileroller and unzip both don't detect any anomalies, I can't make Kettle to unzip the file.
    It's definitely not the size of the file.
    My bed is waiting for me now.
    If no one comes up with an explanation for this I will try tomorrow.
    Meanwhile, unzipping via shell job entry might help us to stay in business.
    So long, and thanks for all the fish.

  18. #18
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by marabu View Post
    I must say there is something about that zip file that makes Kettle choke.
    I'm getting the same, but not just on the Zip, the BZ2 as well.

    When I manually uncompressed them, and rebuilt it as described (bzip2 in zip file), I was able to do a GetFileNames on the zip:file uri, add in the bz2: on the beginning, and read the file with Text File Input.

    final URI ends up being:
    bz2:zip:file:///Path/New.zip!/Text.txt.bz2!


    So... The next question becomes: Do all these files have the same layout?
    Last edited by gutlez; 11-25-2014 at 05:30 PM.

  19. #19
    Join Date
    Mar 2014
    Posts
    181

    Default

    For now, I will go ahead and use the shell job entry.

    Thanks for all your support.

    Thanks,

    Ron

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.