Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: ETL for multiple csv files

  1. #1
    Join Date
    Nov 2008
    Posts
    107

    Default ETL for multiple csv files

    Hello everyone:

    Need tips: I have approximately around 10,000 csv files which are downloadable through a URL(xyz.html). The need is obvious, I need to load it into a single table output.(The downloaded CSV file is in zip format.)

    Any suggestions/tips would be highly appreciated.
    Regards:

    Kaberi

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Kaberi,
    If those files have the same structure you can create a Transformation with a Text File Input step with a wildcard and an Text File Output.
    That should work.

    To unzip those files there is a Step within the Job called Unzip File.
    Mick

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Just some friendly advice:

    1. There is no need to unzip. You can use our VFS driver to read transparently from inside the zip archive.
    2. The "CSV Input" step is also capable of reading multiple files. Simply provide input to the step in the form of filenames.

    HTH,

    Matt

  4. #4
    Join Date
    Nov 2008
    Posts
    107

    Default

    Thanks Matt and Mick for the Quick response.
    Regards:

    Kaberi

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.