Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Import from huge CSV Archive

  1. #1
    Join Date
    Mar 2017
    Posts
    2

    Default Import from huge CSV Archive

    Hello,

    i´m about to create a Spoon Job to import text from CSV-Files to a MySQL database.
    My problem is that I got directories for every day with several csv files in it.
    They are all named like:

    20160101074852_1.CSV
    20170125074852_1.CSV
    .
    .
    .
    ([Date][Time][Nr].CSV)

    In directories named by the date they where created.
    20170125


    First step is to import them all to my Database but it takes plenty of time, so i would like to
    import the latest csv-files daily.

    At the moment i could get the latest date from my mysql database.

    All i need is an idea how to get all new csv-files and import them too.

    Problem i got is if i use the Step: GET FILE NAMES and use a REGEX about the date, there are about 25000 csv files to compare to and this took already half an hour.
    And i havent found a way to use REGEX to get all latest csv-files in comparison to the date i got from my database.

    Hope anyone out there understands what i try to explain and could give me a little hint/advise/solution to it.

    Thanks anyway and best regards

  2. #2
    Join Date
    Aug 2011
    Posts
    360

    Default

    Since you know last date you've got, you can scan only directories that concerns the month starting from last date.
    So do get filenames in two step:
    1. Get only directories names, filter where dirname >= last date.
    2. Then search for csv files in theses directories only.

    However 25'000 filename with a regexp filter should not take half an hour!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.