Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: reading a not so fixed text file

  1. #1

    Unhappy reading a not so fixed text file

    i'm having a hard time on reading a sequential file that is partly fixed. i've tried several steps but couldn't figure out how to make this work

    My file has a listing of people. Each record has a line with P on it and a variable number of records on each row. I couldn't figure out how to read the P, then read the next x lines (number of lines is variable) and storing it on a row..then reading the next P and the next rows as a new record.

    My file is something like:

    P <--marks the begining of a record
    1JOHN DOE <--1 identifies what kind of info comes next, on this case it's name
    212309029309<--2 is the idenfier of the could be social security number, for example
    319800918 <---same for 3...this time birth date
    319800601 <---this record doenst have the type 2 info
    and it goes on..

    is there a way to do this with pdi? i'm desperate..hahaha

  2. #2
    Join Date
    Jun 2012


    If you can't group the rows pertaining to a single record by an existent key present in every row, you will have to introduce an artifical key.
    I would use a scripting step for this.
    Now you can split your input stream using the record type as a discriminator.
    Cut the fields from each record type and join the streams on the key you previously introduced.
    From there on it's straight forward due to the now tabular data format.
    So long, and thanks for all the fish.

  3. #3


    thanks marabu!
    is there a way i can add a sequence, but increment based on a field change?
    there's a step to increment while the field doesn't change and then reset.
    this would't have me changing much of what i have now.

  4. #4
    Join Date
    Jun 2012

    Default How to introduce a group id

    While we have dedicated steps for ongoing and restarting sequences of the form a(n+1) = a(n) + c, we must resort to scripting for any other type of sequence.
    In your case a primitive alternating sequence is most helpful, the terms being defined as recType == 'P' ? 1 : 0 (pseudo code)
    When calculating the series of this sequence the terms of the series magically evolve into the much appreciated group id.
    For the sequence I prefer a User Defined Java Expression.
    The series we find in the Statistics / Group By step disguised as Cumulative Sum.
    Attached Files Attached Files
    So long, and thanks for all the fish.

  5. #5


    this solved my issue. thank you so much!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.