Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Flat File Extraction Advice

  1. #1

    Default Flat File Extraction Advice

    I'd like some advice on the best way to split up a flat file that is structured with a "record type" at the beginning of each row. Each record means something like rec_type 101 is a header record and 107 is a sales ticket header while 108 is the line item detail to the sales ticket. The problem is that each record type has different positions for the fields. So, I can't see how to use the "Text file input" to do what I want.

    What needs to happen is:

    1) Get files from a directory
    2) Read each record type from a file split off into appropriate fields
    3) Rejoin data in a database if related.

    How can I accomplish the above?

    Thanks,
    David
    Attached Images Attached Images  

  2. #2

    Default

    I suppose I would first read it as just two columns, one would be the record type and one would be all the rest.

    Then I would run it through a JavaScript step in which I would create keys. In JavaScript you can check if a variable already exists and only initialize it if it doesn't. That way you can remember values over several rows. So I would have a counter variable for each "parent" record type, like "count101", "count105", "count107"...

    If you write these counters into every row then you have keys with which you can always join the rows again, later on.

    Now you could use a filter to send all cases into different streams and parse the fields of each stream, probably using a regular expression step.

    Since you have keys you could now either do stream lookups to join it all together again or you could just write each stream directly to a different database table...

    Last thought: If your write to the db then you might need to make your keys more unique... maybe by combining it with a timestamp or by getting the last used key at the beginning of your transformation.

    Cheers,

    Axel

  3. #3
    Join Date
    Feb 2009
    Posts
    296

    Default

    I agree with Axel, but I'd stop after splitting the data and just write it to different files.
    That way you can have nice, clean transformations for each of the types and you won't have to use the RegExp step to get your columns.
    Fabian,
    doing ETL with his hands bound on his back

  4. #4
    Join Date
    May 2009
    Posts
    1

    Default

    Miland,

    Did you happen to find a solution to your problem of processing the structured text file? I have a similiar file format, although not as complex in that each logical batch is kept together. But each batch has a header that needs to be processed as well, and it doesn't seem that the Text File Input step does this elegantly.

    Any help would be greatly appreciated.

    Thanks.

    Sanjay

  5. #5

    Wink

    Yes, I did as colorfool suggested and read in each record as two fields. Then, used the "Switch/Case"step to split the second field based on the first. Each of the splits goes to a "String Cut" step and then a "Replace in String" step further refines by dicing up the second field into many smaller fields. Works great! This product is great and the community is always willing to help!

    Thanks,
    David
    Attached Images Attached Images  

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.