Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Parse complex data field

  1. #1
    Join Date
    Feb 2014
    Posts
    3

    Default Parse complex data field

    Hi everybody,

    I am trying to work on a database having a complex data field to parse.

    NB DATA
    01 some text content
    02 some text content


    Here is a sample of a text content:
    Code:
    TYPE1    ID        = 'abc',
             ATT       = 'def'
    
    
    TYPE2    ID        = 'ghi',
             ATT       = 'jkl'
    
    
    COMMON   PARAM1    = 'mno',
             PARAM2    = 'pqr'
    I want to get something like :

    NB ID TYPE ATT PARAM1 PARAM2
    01 abc TYPE1 def mno pqr
    01 ghi TYPE2 jkl mno pqr
    02 [...]

    Do you have some ideas on how to achieve it using pentaho ?

    Best Regards

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You really should provide a sample input text file for download.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Feb 2014
    Posts
    3

    Default

    Quote Originally Posted by marabu View Post
    You really should provide a sample input text file for download.
    Here is a sample file to parse. It is actually included in a field in a database but it should be the same to parse it directly.
    Attached Files Attached Files

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    And here is something to get you going.
    Attached Files Attached Files
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Feb 2014
    Posts
    3

    Default

    Thanks a lot ! I have just an issue : the data is stored in a field in a database : is it possible to make the same parsing on field (instead of a file) ?

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Let's see.

    What are the key features I used from Step Text File Input?

    • Text is parsed into rows. You can easily achieve this with a "Split Field to Rows" step.
    • There are two line filters. Have a look at the "Java Filter" step.
    • The rectype field is repeated. This can be mimicked by a combination of Statistics step "Group By" (function "First non-null value") and the group generating technique shown in my demo.
    • There are three fields carved from a line. Step "Strings cut" can do the same.

    It's your turn now
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.