Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Basic hierarchical data: ability of Pentaho to handle

  1. #1
    Join Date
    Mar 2012
    Posts
    2

    Default Basic hierarchical data: ability of Pentaho to handle

    This question has been discussed to some degree on some previous threads, but some of those are pretty old and I'm interested in whether Pentaho has potentially gotten updated to the point where hierarchical input data is easier to handle. Specifically, my need is as follows:

    Data files that we need to process contain many different kinds of record rows, each row type identified
    by a prefix at the beginning of the record. Some of the records contain header
    info that needs to be carried into reports generated by other rows, and
    sometimes a header record defines how a subsequent record row needs to be
    parsed.

    Here's a mocked-up sample data stream:

    10 1000012345 001 01312012 WA
    20 1123 Chrysler Town & Country
    25 01012012 27150
    25 01022012 27186
    26 01032012 27209 12.125 42.44
    25 01032012 27212
    25 01042012 27255
    25 01062012 27292
    25 01072012 27326
    25 01092012 27361
    25 01102012 27389
    25 01112012 27408
    26 01132012 27430 11.053 39.35
    20 1256 Ford F150
    25 01012012 38150
    25 01022012 38186
    26 01032012 38209 12.125 42.44
    25 01032012 38212
    25 01042012 38255
    25 01062012 38292
    25 01072012 38326
    25 01092012 38361
    25 01102012 38389
    25 01112012 38408
    26 01132012 38430 11.053 39.35

    Thoughts on how to handle?

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I usually write a few lines of JavaScript to handle this data.
    That way it's easy to carry information from types 10, 20 forward to the 25 rows and enrich the data.

    Code:
    var code = substring(line,0,2);
    
    var field1, field2, field3, ...;
    
    if (code == "10") {
      field1 = substring(line, 4, 10);
      field2 = substring(line, 11, 20);
     // reset the other fields too.
    }
    
    ...
    That sort of thing. You may need to filter out a few rows afterwards.

  3. #3
    Join Date
    Mar 2012
    Posts
    2

    Default

    Quote Originally Posted by MattCasters View Post
    I usually write a few lines of JavaScript to handle this data.
    That way it's easy to carry information from types 10, 20 forward to the 25 rows and enrich the data.
    Thanks, Matt. That's helpful, but my question should have included the important proviso "without having to write custom code". The company that is looking at Pentaho is interested in moving away from its current custom-coding approach to data integration and towards a tool like Pentaho. They already do (in their current custom code for processing such files) the kinds of substringing etc. that you describe here.

    In short: if custom coding is still required in order to handle this kind of file, it makes the move to a tool unlikely.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, it's a custom file format.
    I guess you could also filter out the record types with multiple copies of a "Text File Input" step but the visual programming you're doing then isn't going to be much better.

    The main difference however with having a bit of glue code in JavaScript and some script that someone wrote on a disk somewhere is that the former is manageable, transparent, testable and easy to find where the other isn't.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.