Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: reading and parsing weblogs

  1. #1
    Join Date
    May 2009
    Posts
    16

    Default reading and parsing weblogs

    I need to know if Kettle has inbuilt input for reading and parsing web logs.

    I've read and checked the regex evaluation of the kettle and following links

    http://wiki.pentaho.com/display/EAI/Regex+Evaluation
    http://pentaho-en.phi-integration.co...log-with-regex

    I'm trying to check if kettle has any built in input for reading weblogs and parse them into fields, just like log parser kind of product.

    I've not checked the 3.2 release yet but i'll.

    Matt and others, please let me know if kettle can do weblog reading and parsing.

    Have a wonderful day !

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    File / Open:

    Code:
    samples/transformations/Regex Eval - parse NCSA access log records.ktr

  3. #3
    Join Date
    May 2009
    Posts
    16

    Default

    Thanks Matt, i did see that transformation before posting.

    I do have one more question.

    In my table output, line and is_match is also coming with Capture group fields. I've the capture group fields exactly same like you do.

    Is there a way to avoid the "line and is_match" coming in output with Capture group fields. i'm storing them in mysql db and i really don't need the full line and match value. I only need the capture group fields in table output.

    Appreciate your help.

  4. #4

    Default

    Hi,

    2 solutions :
    1- use select values step to remove fields that you don't want in the stream
    2- for >=PDI3.2, use mapping in table output

    http://www.ibridge.be/


    Samatar
    Samatar

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.