Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Parsing Text file with different data sets with different number of columns

  1. #1
    Join Date
    Nov 2012
    Posts
    7

    Default Parsing Text file with different data sets with different number of columns

    Hi,

    I need to load a file that has different datasets. Each dataset I have to load into different table.

    for each dataset it starts with data set type and columns names in multiple rows followed by data as below.

    Second line contains # followed by number of columns, data set type, file number, date. Next few lines contains column names and data type. Actual data is between SSL and EOD lines.

    Like this I have different datasets in the same file with different number of columns and I want to load this data into multiple tables based on data set type. Attached is the sample file.

    *
    # 4 SECURITY CODE MAP FILE.02 20160415
    # 1 Calculation Date calc_date D 8 0
    # 2 As Of Date as_of_date D 8 0
    # 3 Security Name security_name S 25 0
    # 4 Test Security Code Test_security_code N 7 0
    *
    # 1 2 3 4
    SSL>>>>>>>SSL>>>>>>>SSL>>>>>>>>>>>
    |20160415| 20160415| ACCOR
    #EOD
    *

    Please let me know how I can handle this.

    Thanks,
    Attached Files Attached Files

  2. #2
    Join Date
    Aug 2013
    Posts
    22

    Default

    Try this.

    Use Regexp Evaluation to identify when a new dataset/file begins. I assumed that each dataset starts with a row containing 'FILExx'.
    That gives you the starting rowNum of each file. Use Analytic Query to lookup the NextStartRow of the next file. Then join these two streams using a cartesian join with condition rowNum >= StartRowNum AND rowNum < NextStartRowNum.
    That gives you the dataset/filename on each row: FILE01, FILE02, FILE03.

    You can take it from there. E.g. use filter steps to split up the the rows by filename.

    I am not very familiar with Partitioning, but perhaps you can use this method as well to split up the data by partition key = filename and then process each one separately.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.