Hitachi Vantara Pentaho Community Forums
Results 1 to 18 of 18

Thread: Txt input: number of headers lines

  1. #1
    Join Date
    Nov 2010
    Posts
    11

    Default Txt input: number of headers lines

    Hi,
    I'm trying to automatically extract various csv file (compressed in zip format, this is a sample file) from a website through the txt input input step. Unfortunately, these files have a header of two lines that don't allow the step to identify the correct fields. I tried to use the "Header & number of header lines" option, increasing the number of lines, but it doesn't work!
    anybody can help me?

  2. #2
    Join Date
    Apr 2009
    Posts
    337

    Default

    could there be a problem with the next line format , is it dos/unix ?
    Regards,
    Madhu

  3. #3
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Ciao Andrea,
    I would import the file as one long text field, then filter first 2 rows (Add sequence - Filter Rows) and then use the Split Rows into fields.

    Mick.

  4. #4
    Join Date
    Nov 2010
    Posts
    11

    Default

    Quote Originally Posted by Mick_data View Post
    Ciao Andrea,
    I would import the file as one long text field, then filter first 2 rows (Add sequence - Filter Rows) and then use the Split Rows into fields.

    Mick.
    hi mick,
    it's a good idea but i must import many files at the same time, not only one..

  5. #5
    Join Date
    Nov 2010
    Posts
    11

    Default

    Quote Originally Posted by madhupenta View Post
    could there be a problem with the next line format , is it dos/unix ?
    i use mixed option then i don't have format problems

  6. #6
    Join Date
    Apr 2009
    Posts
    337

    Default

    bingo, buy me a beer
    Regards,
    Madhu

  7. #7
    Join Date
    Nov 2010
    Posts
    11

    Default

    Quote Originally Posted by madhupenta View Post
    bingo, buy me a beer
    WHAT?? if u tell me the solution i bring you 2 beers!

  8. #8
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    i must import many files at the same time, not only one..
    I had to do the same thing.

    Look at:
    http://www.cloud2land.com/2011/06/pe...ultiple-files/

    Mick

  9. #9
    Join Date
    Apr 2009
    Posts
    337

    Default

    Oh ahem.. sorry i thought your issue was solved by setting mixed! anyways what is the error that you get when you set number of header lines as 2 ? What happens when you preview?
    Regards,
    Madhu

  10. #10
    Join Date
    Nov 2010
    Posts
    11

    Default

    The problem is the i cant's skip the first two rows. When i increase the header lines, kettle read anyway the first lines....
    i must get many files from the web....I should be able to cancel the first two lines of each file, before retrieving the fields..
    it's possible?

  11. #11
    Join Date
    Apr 2009
    Posts
    337

    Default

    You could include rownum in output. fieldname, and then filter out rows 1 and 2!
    Regards,
    Madhu

  12. #12
    Join Date
    Nov 2010
    Posts
    11

    Default

    but the txt input step sets automatically the first row as a field! this is the problem... i want first to delete the first two lines then set the fields (actually in the row 3). oh... what a mess...

  13. #13
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Does that mean there's something wrong with the separator or enclosure symbols?

  14. #14
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Andrea.
    Here's what I have done in a very similar situation.
    First, check my link to the blog when it explains how to process one file at a time.
    Once you have sorted that problem, to read your text file I would do as follow:
    1. Text file input: NO HEADER ROWS, import file as 1 field, therefore use a delimiter that do NOT exist in your file.
    2. Filter rows: select rows where rownumber > 3
    3. Split rows into field: set the proper delimiter and fields.

    That should solve your task.
    Mick

  15. #15
    Join Date
    Nov 2010
    Posts
    11

    Default

    ahuch... it's so complicated..

  16. #16
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Andrea.
    It's not complicated.

    See attachment.

    If it works you owe me a beer ;-)

    Mick
    Attached Files Attached Files

  17. #17
    Join Date
    Nov 2010
    Posts
    11

    Default

    OH TNX!
    now i try it!

  18. #18
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Andrea,
    if it does not work you can send me a PM (Private Message).

    Mick

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.