Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Is XLSX Input for S3 available?

  1. #1
    Join Date
    Sep 2016
    Posts
    3

    Default AMAZON S3 XLSX Input for spoon

    Hi,

    I have the requirement to process xlsx files stored on s3, it seems there is no step available out of the box on Spoon to do that i can only see S3 csv ip/op steps. Is there any way work around to do that. ?

    Thanks
    Last edited by sriniv_rk; 12-27-2016 at 05:36 AM.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    While S3-CSV-Input can retrieve the content of a CSV file from a S3 bucket via Java API, you should be able to retrieve any file via the REST API. Since Kettle uses Apache Commons VFS you even should be able to supply the URL to the Excel-Input step. If there's a problem with that you must download the file prior to row/field extraction.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Sep 2016
    Posts
    3

    Default

    Posting the solution that worked. If anyone is looking for similar thing.

    After 2 hours of searching on forums and going through source code for pentaho-s3-vfs on github

    I'm finally able to read XLSX files stored on S3.
    1. Just add s3://{AWSAccessKeyId}:{AWSSecretKey}@s3/{bucket}/{folder}/.../FileName.xlsx in File or Directory in xlsx input
    2. Select proper spread sheet type(engine) accordingly.

    Note: Do not forget to replace "\" with "%2B",a "/" with "%2F" from your AWSSecretKey as those characters will form bad urls

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.