Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: String operations

  1. #1
    Join Date
    Feb 2017
    Posts
    8

    Default String operations

    Hi! I just start working with Kettle few days ago and I'm trying to complete an university project. So I'm asking if anyone can help me please!

    So the problem is:

    I have a file with some records. The file has this type:

    87.196.80.130 - - [29/Nov/2016:08:40:34 +0000] "GET /css/normalize.css HTTP/1.1" 200 9559
    87.196.80.130 - - [29/Nov/2016:08:40:34 +0000] "GET /css/normalize.css HTTP/1.1" 200 9559 "https://www.google.pt/" "Mozilla/5.0 (Linux; Android 5.0; Aquaris E5 HD Build/LRX21M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.85 Mobile Safari/537.36"
    87.196.80.130 - - [29/Nov/2016:08:40:34 +0000] "GET /css/tables.css HTTP/1.1" 200 2127
    87.196.80.130 - - [29/Nov/2016:08:40:34 +0000] "GET /css/tables.css HTTP/1.1" 200 2127 "https://www.google.pt/" "Mozilla/5.0 (Linux; Android 5.0; Aquaris E5 HD Build/LRX21M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.85 Mobile Safari/537.36"

    As you can see, every line is a record! And what i need is:
    - First, select only the records that have a website (in the case before, the records 2 and 4);

    In that case, i just have these two:

    87.196.80.130 - - [29/Nov/2016:08:40:34 +0000] "GET /css/normalize.css HTTP/1.1" 200 9559 "https://www.google.pt/" "Mozilla/5.0 (Linux; Android 5.0; Aquaris E5 HD Build/LRX21M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.85 Mobile Safari/537.36"
    87.196.80.130 - - [29/Nov/2016:08:40:34 +0000] "GET /css/tables.css HTTP/1.1" 200 2127 "https://www.google.pt/" "Mozilla/5.0 (Linux; Android 5.0; Aquaris E5 HD Build/LRX21M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.85 Mobile Safari/537.36"

    - Second, transform that records to a specific format.

    The format that i need is:

    1,"87.196.80.130","2016-11-29 08:40:34.00","https://www.google.pt/","(Linux; Android 5.0; Aquaris E5 HD Build/LRX21M)"
    2,"87.196.80.130","2016-11-29 08:40:34.00","https://www.google.pt/","(Linux; Android 5.0; Aquaris E5 HD Build/LRX21M)"


    I really want to do this and I've been trying for several weeks. Thank you for helping and I'm sorry for my "broken" English!

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Add your Filter-Rows step at the end of the demo.
    Also, don't worry, your English is fine.
    Attached Files Attached Files
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Jan 2015
    Posts
    107

    Default

    Here is a sample transformation that does what you want: parse_weblog.zip

    You'll probably have to tweak the regular expressions and date conversion, but you can see clearly what's happening in each step by previewing.

  4. #4
    Join Date
    Feb 2017
    Posts
    8

    Default

    Thank you so so much marabu! I can't thank you enough!
    I try to do my best with the English!

  5. #5
    Join Date
    Feb 2017
    Posts
    8

    Default

    Thank's Isha Lamboo! It was a very good help to!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.