Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Mixed layout

  1. #1
    Join Date
    Mar 2010

    Exclamation Mixed layout

    What's the work around to process a file with varying number of fields ?

  2. #2
    Join Date
    Sep 2009


    There are a few possibilities and I suppose that the best choice depends on the nature of the file

    If you have different types of records in a single file, you'd usually also have a type field somewhere that tells you what kind of record the line represents. It may be possible to preprocess the file using grep/sed/awk/perl to separate the record types into distinct files which have a fixed number of fields each. You could read and process each file separately then. Another possibility is to try and read the file with an abstract record structure with the maximum expected fields (all strings). You could use the text file input step for that. Down the row stream you'd detect the record type and filter and transform the records based on their type.

    If you have a file that has some really weird non-standard format, you can always generate the rows by getting the record fields manually. To do that I'd use a text input step that reads each line into a single string field. Further down the stream I'd place a JavaScript step that would extract the relevant information from the lines into a sensible field structure.

    If you'd post a minimal example of your concrete situation I'd provide a sample.



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.