Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Read multiple lines from flat file into single record

  1. #1
    Join Date
    Jun 2013
    Posts
    7

    Default Read multiple lines from flat file into single record

    My input file contains records which span multiple lines and record begins with a line break then eight digits then a '1'. I am a newbie but I believe what I want is a "text file input" that has a filetype of CSV where the separator is a regular expression matching a this pattern but can't get it to work. Can anyone point me in the right direction? TIA.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Type sample data into a Data Grid step or a text file and attach it.
    If you must comment on the intricacies of your data, go ahead.
    Don't forget to describe the output you expect.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Jun 2013
    Posts
    7

    Default

    My input might look like this:

    Code:
    000000011This is part of the first record
    000000015This is still the second record
    000000015Again the first record
    000000021A second row
    000000025Still the second row
    000000025Still second row
    000000025Second row
    000000031The third row
    000000032Still third row
    This is the output I would like:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <Rows>
     <Row>000000011This is part of the first record
    000000015This is still the second record
    000000015Again the first record</Row>
    
    
     <Row>000000021A second row
    000000025Still the second row
    000000025Still second row
    000000025Second row</Row>
    
    
     <Row>000000031The third row
    000000032Still third row</Row>
    
    
    </Rows>
    (files with these contents also attached).

    I can achieve the same using "Load file content in memory" into a single field then hop to "Split field to rows" using a regular expression "\r\n[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]1". However, loading the entire file into memory might not be an option with very large files.
    Attached Files Attached Files

  4. #4
    Join Date
    Jun 2013
    Posts
    7

    Default

    All I want is a single field which contains all characters up until the occurrence of a string, then to process the next set of characters until the next occurrence of that string, and so. Basically, a CSV where a record can span an arbitrary number of lines.

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by Dan Newman View Post
    All I want is a single field which contains all characters up until the occurrence of a string
    Something like this?
    Attached Files Attached Files
    So long, and thanks for all the fish.

  6. #6
    Join Date
    Jun 2013
    Posts
    7

    Default

    Thanks that worked a treat. It also passed the required load test of handling a >2Gb input file. Much appreciated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.