Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: kettle - split variable length list field

  1. #1
    Join Date
    May 2007
    Posts
    2

    Default kettle - split variable length list field

    Hello,

    I have the following problem in importing our data into a Mondrian cube. I want to process log files which contain lines with set fields. For example:

    'id=1&date=2006-08-20&items=itemOne,itemTwo, itemThree&view=search'

    , where 'items' is a comma-separated set field. The number of elements in the field can vary.

    All fields from such a line go to a DB table with the same structure, i.e., each log line field corresponds to one column in the DB table. Exception is made for these set fields such as "items" (from the example above) which should go in a separate DB table with 2 columns: "item" and "logLineId". A new row should be added in this table for each item from the "items" set.

    The question is how to express in the KETTLE modeling environment a varying number of INSERTs on the second table for each log line that I read from the log file? I cannot fix the number of inserts in advance as the set size varies by definition.

    So far I have figured a (semi)solution around the following idea:
    I have the following Kettle tranformation. TextFileInput spits values to a JavaScript which cuts set fields and these go to the second TableOutput. All the rest of the fields of the TextFileInput go to the first TableOutput. But again this does not work because in the JavaScript step I have to specify fixed number of output fields for the next step to use but I don't know how many output fields I will have for each log line coming from TextFileInput.

    --Anton

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Dear Anton,

    The layout of rows passing through a transformation always need to be the same.
    In general, any step needs to generate a deterministic (predictable) output.
    Your requirement of having a random and unpredictable amount of output fields for the the "Split field" step is breaking that rule.
    It would prevent you also from doing anything meaningful with the fields as you would not be able to put them into a database, do any calculations on them, etc.

    That being said, you can do anything you want with it in a Javascript step and we have an example on how to split a field into several parts in the samples/transformations directory of your Kettle distribution.

    All the best,

    Matt

  3. #3
    Join Date
    Jun 2007
    Posts
    3

    Default

    Hi

    I have a similar question, but one that is based on an output with a fixed number of fields. Here's the situation:

    1. Data INPUT from a single Access database table

    2. Certain INPUT fields then need splitting into a fixed number of other fields (the INPUT field has several comma separated values)
    example:
    INPUT FIELD A -> OUTPUT FIELDS A,B,C,D
    INPUT FIELD B -> OUTPUT FIELDS E,F,G

    3. Certain fields require transformation logic
    example:
    INPUT FIELD C -> CONDITION -> OUTPUT FIELD H

    4. Certain fields require merging
    example:
    INPUT FIELD D + E -> OUTPUT FIELD I

    4. The OUTPUT needs to be a fixed column width file
    example:
    OUTPUT FIELD A width 3 chars
    OUTPUT FIELD B width 6 chars
    OUTPUT FIELD C width 3 chars


    This basically means I need to take an input with say 20 fields, perform the necessary mappings/transformations and write to an output file with say 30 fields (will always be the same number and is therefore deterministic - just can't figure out how to do it).

    I have been battling with this for most of the day (without much success) so your help would be extremely useful! Is what I am trying to do even achievable in Pentaho Kettle? If not, could you point me in the right direction?

    Thank you

    Joff
    Attached Images Attached Images  
    Last edited by joff; 09-03-2007 at 08:56 AM.

  4. #4
    Join Date
    Jun 2007
    Posts
    3

    Default kettle - split variable length list field

    Shameless bump...

    Quote Originally Posted by joff View Post
    Hi

    I have a similar question, but one that is based on an output with a fixed number of fields. Here's the situation:

    1. Data INPUT from a single Access database table

    2. Certain INPUT fields then need splitting into a fixed number of other fields (the INPUT field has several comma separated values)
    example:
    INPUT FIELD A -> OUTPUT FIELDS A,B,C,D
    INPUT FIELD B -> OUTPUT FIELDS E,F,G

    3. Certain fields require transformation logic
    example:
    INPUT FIELD C -> CONDITION -> OUTPUT FIELD H

    4. Certain fields require merging
    example:
    INPUT FIELD D + E -> OUTPUT FIELD I

    4. The OUTPUT needs to be a fixed column width file
    example:
    OUTPUT FIELD A width 3 chars
    OUTPUT FIELD B width 6 chars
    OUTPUT FIELD C width 3 chars


    This basically means I need to take an input with say 20 fields, perform the necessary mappings/transformations and write to an output file with say 30 fields (will always be the same number and is therefore deterministic - just can't figure out how to do it).

    I have been battling with this for most of the day (without much success) so your help would be extremely useful! Is what I am trying to do even achievable in Pentaho Kettle? If not, could you point me in the right direction?

    Thank you

    Joff
    Last edited by joff; 09-03-2007 at 08:58 AM. Reason: Need an answer

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Shame on you... so you're probably best of with a javascript piece. In the samples directory and the FAQ there are several examples related to your question: like how to split fields, how to make new fields, ...

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.