Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Transferring a huge csv into a table

  1. #1
    Join Date
    Jan 2017
    Posts
    2

    Default Transferring a huge csv into a table

    I am a low level user of data and have not used MySQL in a while. I am new to PDI.

    My father bought some data for political use and it has over a million rows and 267 columns.

    I created a table so he can actually use his data, but I am having trouble getting the data into the table.

    I want to know if there is a way to get the data from the csv using pentaho into a fresh table generated by pentaho. I realize that's not ideal, but I'm only 90% sure of my table I created.

    Couple things I have done so far:

    Using workbench's auto import is taking too long. windows 10 autoupdated (argh) and screwed that experiment

    Using a load in file has gone wonky on me. it hangs up on fields, claiming that it is expecting an integer and pulling a quotation mark. I have properly identified the field enclosure, so not sure why it would do that.

    trying to use pentaho input csv file output table throws an error almost immediately. I have no clue how to read the error.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You should assert Text-File-Input is configured right.

    So, start with a single step in your transformation: Text-File-Input.
    Use Get-Fields to retrieve the field list including types.
    Configure it appropriately using all forgiving settings like mixed format.
    If you don't know the file encoding try UTF-8 before windows-1252.
    If your spoon test run succeeds, save your transformation.
    Use "pan -file=your.ktr -log=your.log -level=Minimal" to run your transformation.
    Have some hot java.

    Now it's time to add Table-Output - you can't use the MySQL-Bulk-Loader under Windows due to technical reasons.
    Prepare your database connection using the Wizard.
    Your MySQL table already exists? If not use button SQL from Table-Output to ease creation.
    Don't create indexes yet.
    Configure Table-Ouput.
    Set Commit size to something like 50,000 to speed things up.

    Let' see what happens.
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.