Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: How to implement a Real- Time ETL wirh Kettle 7.0

  1. #1
    Join Date
    Dec 2016
    Posts
    3

    Question How to implement a Real- Time ETL wirh Kettle 7.0

    Hi All,

    I am starting using Kettle V7.0 and as requirement I'll have to create a real-time ETL. Does anyone have any URL or documentations that could help me to implement this ?


    Thanks in advance

    Regards,

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Kettle is a batch tool by nature.
    IMO real-time ETL is an oxymoron.
    The best you can get is a low update latency (of a couple of seconds) by continually restarting your process.
    Of course, you can try to keep Kettle running as shown in the Twitter demo in your samples subfolder.
    The Enterprise Edition comes with message processing capabilities.
    When embedding the Kettle engine in a Java program you're not confined to batch processing anymore.
    Finally, you may find a scheduler that responds to certain events by starting Kettle.
    So, look at your requirements for real-time ETL closely, before everything else.

    PS: I don't know anything about Kettle v7, and not very much about the Enterprise Edition, so don't believe a word I say...
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Dec 2016
    Posts
    3

  4. #4
    Join Date
    Mar 2012
    Posts
    10

    Default

    Quote Originally Posted by asoares View Post
    Hi All,

    I am starting using Kettle V7.0 and as requirement I'll have to create a real-time ETL. Does anyone have any URL or documentations that could help me to implement this ?


    Thanks in advance

    Regards,
    It will depend a lot on what, exactly, you need.. but for an (old) example from The Man himself on how to read Twitter feeds indefinitely:


    Real-time streaming data aggregation


    As another example, there is also the Apache Kafka Consumer (available via the marketplace) which looks like it may read data indefinitely, depending on how it's configured. Haven't tried it though, so you'd have to check for yourself. It may need a workaround for PDI 6.1, at least, it seems.

    Any such transformation should probably be wrapped in a job that will restart upon termination due to some error, etc. Or just schedule it very often, like every minute or so, meaning max 60 seconds downtime.

  5. #5
    Join Date
    Dec 2016
    Posts
    3

    Default

    Hi Johan,

    Indeed, I'll have to "watch" some tables at source and, for each new record i'll have to bring it to my staging area. The Twitter example put some light upon the process.

    Thanks for your post.

    Regards

    Last edited by asoares; 12-29-2016 at 12:03 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.