Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: how to use Kettle to trait the log file

  1. #1

    Default how to use Kettle to trait the log file

    Hi

    I am now using Kettle to trait a log file, it's not a CSV file either a file with the fields fixed and defined.

    the content is about the email flux, there are certain fields, but the problem is that not all flux of email is same, sometimes a flux contains certain fields, sometimes not; some flux has 4 lines, others has 3 or 5.
    The goal is to well trait and classify all the fields by each mail.

    I tried a lot, even use regulation expressions, but can't find a way to get this job done.
    Any suggestion will be grateful !

    this file is like this, a email come, transform and exit


    -----------------------------------------
    Jun 10 23:59:00 msevf0101 postfix/smtpd[28672]: connect from msevf0112[172.23.17.42]
    Jun 10 23:59:00 msevf0101 postfix/smtpd[28672]: C4E531C0000B3: client=msevf0112[172.23.17.42]
    Jun 10 23:59:00 msevf0101 postfix/cleanup[28824]: C4E531C0000B3: warning: header X-GIESVAWMSGID: 20090610215900_800757011230497 from msevf0112[172.23.17.42]; from=<01931@931.01.rss.fr> to=<teletrans@edi.cegedim.net> proto=ESMTP helo=<msevf0112.atos-internet.rss.fr>
    Jun 10 23:59:00 msevf0101 postfix/cleanup[28824]: C4E531C0000B3: message-id=<SV/00000951104108/2009061100003598045@vip_smtp>
    Jun 10 23:59:00 msevf0101 postfix/smtpd[28672]: disconnect from msevf0112[172.23.17.42]
    Jun 10 23:59:00 msevf0101 postfix/qmgr[25855]: C4E531C0000B3: from=<01931@931.01.rss.fr>, size=1977, nrcpt=1 (queue active)
    Jun 10 23:59:00 msevf0101 postfix/smtp[26333]: C4E531C0000B3: to=<teletrans@edi.cegedim.net>, relay=pmx02z.cegedim.fr[194.126.236.172]:25, delay=0.08, delays=0/0/0.03/0.05, dsn=2.0.0, status=sent (250 ok: Message 39328466 accepted)
    Jun 10 23:59:00 msevf0101 postfix/qmgr[25855]: C4E531C0000B3: removed
    Jun 10 23:59:01 msevf0101 postfix/smtpd[27523]: 07F871C0000B3: client=unknown[121.135.151.172]
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28091]: warning: 203.81.228.21: hostname host21-228.worldcall.net.pk verification failed: Name or service not known
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28091]: connect from unknown[203.81.228.21]
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28091]: NOQUEUE: reject: CONNECT from unknown[203.81.228.21]: 554 5.7.1 Service unavailable; Client host [203.81.228.21] blocked using xbl.spamhaus.priv; http://www.spamhaus.org/query/bl?ip=203.81.228.21; proto=SMTP
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28091]: disconnect from unknown[203.81.228.21]
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28672]: connect from msevf0104[172.23.17.26]
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28672]: 311DD1C0000B4: client=msevf0104[172.23.17.26]
    Jun 10 23:59:01 msevf0101 postfix/cleanup[28824]: 311DD1C0000B4: warning: header X-GIESVAWMSGID: 20090610215901_184186010417846 from msevf0104[172.23.17.26]; from=<01931@931.01.rss.fr> to=<teletrans@edi.cegedim.net> proto=ESMTP helo=<msevf0104.atos-internet.rss.fr>
    Jun 10 23:59:01 msevf0101 postfix/cleanup[28824]: 311DD1C0000B4: message-id=<SV/00000951117308/2009061100003598046@vip_smtp>
    Jun 10 23:59:01 msevf0101 postfix/smtpd[28672]: disconnect from msevf0104[172.23.17.26]
    Jun 10 23:59:01 msevf0101 postfix/qmgr[25855]: 311DD1C0000B4: from=<01931@931.01.rss.fr>, size=1979, nrcpt=1 (queue active)
    Jun 10 23:59:01 msevf0101 postfix/smtp[28061]: 311DD1C0000B4: to=<teletrans@edi.cegedim.net>, relay=pmx01z.cegedim.fr[194.126.236.168]:25, delay=0.12, delays=0/0/0.05/0.07, dsn=2.0.0, status=sent (250 ok: Message 49771799 accepted)
    Jun 10 23:59:01 msevf0101 postfix/qmgr[25855]: 311DD1C0000B4: removed
    Jun 10 23:59:01 msevf0101 postfix/cleanup[28815]: 07F871C0000B3: message-id=<000d01c9ea16$a68fecc0$6400a8c0@riemannv3>
    Jun 10 23:59:01 msevf0101 postfix/qmgr[25855]: 07F871C0000B3: from=<riemannv3@smedbo.de>, size=1045, nrcpt=1 (queue active)
    Jun 10 23:59:01 msevf0101 filter_in: Rejet de l id 07F871C0000B3 Cause :Bad Subject <Get a degree with no problems.> Bad Content-Type : <text/plain; format=flowed; charset=iso-8859-1; reply-type=original> Bad Content-Description : <>
    Jun 10 23:59:01 msevf0101 postfix/pipe[26970]: 07F871C0000B3: to=<92831@831.92.rss.fr>, relay=filter, delay=0.79, delays=0.65/0/0/0.14, dsn=2.0.0, status=sent (delivered via filter service)
    Jun 10 23:59:01 msevf0101 postfix/qmgr[25855]: 07F871C0000B3: removed
    Jun 10 23:59:01 msevf0101 postfix/smtpd[27523]: disconnect from unknown[121.135.151.172]
    ----------------------------------------------------

  2. #2

    Default

    Hi,

    maybe you have to use regular expression :-)

    take a look at PDI step :

    http://wiki.pentaho.com/display/EAI/Regex+Evaluation

    Someone already parsed Tomcat log files :

    http://pentaho-en.phi-integration.co...log-with-regex


    Take care Samatar
    Samatar

  3. #3

    Default

    thanks
    i've already checked this, but dosen't solved my problem.
    I think it's a problem of such log file, too complex

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.