Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Let me start by saying I'm sorry... (processing a multitable CSV file)

  1. #1
    Join Date
    Nov 2013
    Posts
    2

    Default Let me start by saying I'm sorry... (processing a multitable CSV file)

    I know everyone hates these questions, in any forum, particularly on first post. A newb signs up and expects a complicated problem resolved without putting in the hours to master the app...

    So please accept my apologies.

    I am under extreme pressure at work to deliver a result and what's worse - I'm not a dev nor analyst but a rather mediocre sysadmin. I've been tasked to automatically pick up and process two largish CSV files (7K rows and 50K rows), process them by breaking them apart into individual CSVs (I'll explain in a minute), them load them into a DB, one file per table. I have spent many days, trying many tools to accomplish this. I think that any dev with basic vb, perl, python, java skills would probably bang this out in a day or less. However, I'm doing this in a larger context where there may be many similar projects and I want to build an ecosystem of tools that will accomplish this and Pehtaho suite is at the top of my list as being that ecosystem and this is why I'd rather try and accomplish this in PDI.

    On with the question... Here is an example of the CSV file:

    "NAME","AGE","SEX","WEIGHT","CITY"
    "Bob",20,"M",120,"New York"
    "Bob",33,"M",220,"Toronto"
    "Bob",43,"M",130,"Miami"
    "NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
    "Larry","USA","Football",14,"Baseball",22
    "Larry","UK","Rugby",5,"Field Hockey",11
    "Larry","Canada","Hockey",19,"Volleyball",4
    "NAME","DRINK","QTY","DRINK","QTY"
    "Jesse","Beer",6,"Juice",2
    "Jesse","Juice",1,"Water",1
    "Jesse","Milk",3,"Coffee",5
    "NAME","AGE","SEX","WEIGHT","CITY"
    "Marry",20,"F",120,"New York"
    "Marry",33,"F",220,"Toronto"
    "Marry",43,"F",130,"Miami"

    As you can see - it has a number of headers and they are different. The columns vary from header to header, but every header row starts with "NAME" and all rows that follow have the same first record until the next header.. I would like to use PDI to pick up this CSV and split it into individual CSVs that are named after the "NAME" column, like this:

    Bob.csv
    "NAME","AGE","SEX","WEIGHT","CITY"
    "Bob",20,"M",120,"New York"
    "Bob",33,"M",220,"Toronto"
    "Bob",43,"M",130,"Miami"

    Larry.csv
    "NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
    "Larry","USA","Football",14,"Baseball",22
    "Larry","UK","Rugby",5,"Field Hockey",11
    "Larry","Canada","Hockey",19,"Volleyball",4

    And so on.

    I would really appreciate some help. In the meantime, I'm working through the userguide and Googling feverishly for examples that can help. I thank anyone that participates in advance, and again I apologize for this being my first post.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Here's a demo.
    Try to find out how it's done.
    And don't forget to adjust the OUTPUT_FOLDER parameter.
    Attached Files Attached Files
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Nov 2013
    Posts
    2

    Default

    It works! You are too kind! I will be studying this closely this weekend! Thank you very much!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.