Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Manipulating Unstructured Data File

  1. #1
    Join Date
    Sep 2010
    Posts
    7

    Default Manipulating Unstructured Data File

    Good morning. I have a file that is used to load our database. The file is more of an EDI structure where the data has varying number of fields and is distinguished by the first record which is the field type. Based on that value, the rest of the fields get their meaning. So its not a table structure and there are now field names. the data is tilde (~) delimited. there is a problem with the data that is sent to us and I need to remove records that start with ENCHDR (maybe a Left([??],6)="ENCHDR") and the last value on that record (row) is a 1 (Right([??],1)="1") and then write out the file maintaining the same sequence of records...I am assumg that it would read from top (beginning) to bottom and write back out in that same order but not sure it that needs to be done explicitly. I was able to read in the data but it seems like PDI needs to tell it what fields to write out.... I believe the record with the most fields is 138 fields....Not sure if and how to create a prefined structure like field1, field2, etc to pull the file or if there a much easier what to do it...I would have to write the file out as ~ delimited if I pulled it in as fields....

    Here is a sample.... Masked names and id numbers...The word wrap might make it difficult visualize but most of the records would start with a ENC with a couple additional letters.

    Appreciate and help,thoughts or guidance....

    ENTHDR~1~2.0
    ENCHDR~H1C~20131028 11:17:00~HD~123-45-6789~19421028~F~PATSY~HQ~000000012~001542896~SPOON, PAT~- -~~~ZA,JUNE~HQ~19378~~~~~~~~~~~~1~000439767~123301~20131028 23:59:00~OTH~~1~~1
    ENCHDR~H1C~20131104 13:27:00~HD~231-12-4332~19590814~F~ADA~HQ~000000406~001220121~DOE, JANE~- -~~~VAL, JIMBOB A~HQ~2223~~~~~~~~~~~~1~000000406~4063308~20131104 23:59:00~OTH~~1~~0
    MBRPER~~~~~~~~W~B~~~U~OTHER~~~~0~0~~C~BERRY~~~~~~~~~~~0~0~0
    ENCNTR~~~~~~~~~~~175~C~~1~- -~~~VAL,JIMBOB A~HQ~2223~~~0~~~~~~~X~~0~- -~~~VAL, JIMBOB~HQ~2223~~~~~~~~~~~~~123-45-6789~19590814~~BERRY,STEVE C~HQ~000000406~001220121~~~~~~~~~~~~~~~~~~~~~~~0~0~U~OTHER~W~~~~~~~0~0~0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~00000
    ENCPRAC~- -~~~VALERIO,HERNANI A~HQ~2223~~~ADMT
    ENCREAS~DIAGNOSIS~ICD9CM~111.46~ADMITTING~0000~~~~~~~~
    ENCREAS~DIAGNOSIS~ICD9CM~111.46~PRINCIPAL~0001~~~~~~~~
    ENCPAYOR~Q~1~HMC~HMC~0~NA~231-12-4332~19190814~F~ADA~HQ~000000406~001220121~H54458881~P~100.00~~-36.16~0.00~533.51~0.00~000~000~0.00~20131108~~~20131115~533.51~~~~~~~~~~HQ~Q~
    ENCPAYA~Q~1~20131115~ADJ~~00000708~-447.35~
    ENCUDPYA~PMT_APPL_TO_DATE~20131104
    ENCPAYA~Q~1~20131115~PAY~~00000421~-36.16~
    ENCUDPYA~PMT_APPL_TO_DATE~20131104
    ENCPAYOR~PAT~8~~~1~~~~~~~~001220121~~~-50.00~~0.00~~~~~~~20131108~~~~0.00~~~~~~~~~~HQ~Q~
    ENCUD~SERVCODE~175
    ENCUD~REFERRING HOSP PHYS IND~N
    ENCUD~FIRSTBILLDT~20131108
    ENCUD~MSP~N
    ENCSIHDR~1302412~CHARGE~05173510~20131104 00:00:00~HQ
    ENCSI~~~~~~~~~~~~~~HDC~~297.41~~~~~~~~~~~~~~~~1~1~~~~~~20131104 23:59:00~~~~~~~RT~~~~~~~~~~~~~~73510~0320~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by dieffen View Post
    I need to remove records that start with ENCHDR (maybe a Left([??],6)="ENCHDR") and the last value on that record (row) is a 1 (Right([??],1)="1") and then write out the file maintaining the same sequence of records
    Kettle shines with tabular data, but sure can treat the odd file, too.
    But if you don't need that file in tabular format, why using Kettle at all?

    Code:
    grep -v -e '^ENCHDR~.*~1$'  your-filename-here
    So long, and thanks for all the fish.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.