Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: JDBCKettle - contribute or standalone?

  1. #1
    Nick Goodman Guest

    Default JDBCKettle - contribute or standalone?

    Greetings Kettle Devs

    A few months ago my company began building an EII interface to
    Kettle. In particular, we were looking to allow Kettle
    transformations to be used in an EII way from more than just the
    Pentaho runtime "Get Data From" component. This is useful in Pentaho
    Report Designer along with other tools (Jasper/BIRT/etc).

    The interface we arrived at that would have far reaching
    compatibility was, no surprise, JDBC. Tom Qin and I have worked on
    this driver and have reach an initial version that is "baked enough"
    to use. You can see an example here: http://
    demo.bayontechnologies.com/pentaho/ViewAction?
    outputType=html&run_as_background=No&solution=samples&action=examplecsv.
    xaction&path=&userid=guest&password=guest

    The basic idea is to point the JDBC driver at a directory of KTRs,
    execute SQL like "select * from my_transformation.my_step" and get
    the results. The JDBC driver basically translates Kettle metadata
    (transforms, steps, fields, datatypes) into JDBC metadata (schemas,
    tables, columns, column types), parses SQL, starts the
    transformations and returns the in memory rows that are passed out of
    the step name.

    The project website is: http://code.google.com/p/jdbckettle/

    My question for the list is: is this useful enough to include in the
    Kettle project? If others would find it useful, are willing to help
    maintain it, etc we'd be happy to commit it to Kettle. If I hear
    crickets from this email I'll assume that we are the only ones that
    find this "interesting" and will continue to maintain it as a
    standalone project.

    Thoughts/Ideas?

    Nick

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    alan bonnemaison Guest

    Default Re: JDBCKettle - contribute or standalone?

    A few questions from a newbie here.....

    1) How robust it? Have you tested it against large volumes of data?
    2) Compared to Hibernate or iBatis, where does it stand?
    3) The transformation has to be executed first between the metadata can be
    read. Though I understand why it's done in such manner, isn't it a major
    point of failure? What if I get a Java heap error on my server and can't
    start my transformation...
    4) Say, I design a *slow* transformation (with Select Values, Switch...
    steps) with Kettle. Given the nature of JDBC, isn't my code with JDBCKette
    going to be slow too?

    Going back to my corner before you start throwing stuff at me..........

    *Al.

    kg6ypx_at_gmail_dot_com*
    *a.k.a.
    **acbonnemaison on the Pentaho Forums.*


    On Fri, Jan 30, 2009 at 9:11 PM, Nick Goodman <
    ngoodman (AT) bayontechnologies (DOT) com> wrote:

    >
    > Greetings Kettle Devs
    >
    > A few months ago my company began building an EII interface to
    > Kettle. In particular, we were looking to allow Kettle
    > transformations to be used in an EII way from more than just the
    > Pentaho runtime "Get Data From" component. This is useful in Pentaho
    > Report Designer along with other tools (Jasper/BIRT/etc).
    >
    > The interface we arrived at that would have far reaching
    > compatibility was, no surprise, JDBC. Tom Qin and I have worked on
    > this driver and have reach an initial version that is "baked enough"
    > to use. You can see an example here: http://
    > demo.bayontechnologies.com/pentaho/ViewAction?
    > outputType=html&run_as_background=No&solution=samples&action=examplecsv.
    > xaction&path=&userid=guest&password=guest
    >
    > The basic idea is to point the JDBC driver at a directory of KTRs,
    > execute SQL like "select * from my_transformation.my_step" and get
    > the results. The JDBC driver basically translates Kettle metadata
    > (transforms, steps, fields, datatypes) into JDBC metadata (schemas,
    > tables, columns, column types), parses SQL, starts the
    > transformations and returns the in memory rows that are passed out of
    > the step name.
    >
    > The project website is: http://code.google.com/p/jdbckettle/
    >
    > My question for the list is: is this useful enough to include in the
    > Kettle project? If others would find it useful, are willing to help
    > maintain it, etc we'd be happy to commit it to Kettle. If I hear
    > crickets from this email I'll assume that we are the only ones that
    > find this "interesting" and will continue to maintain it as a
    > standalone project.
    >
    > Thoughts/Ideas?
    >
    > Nick
    >
    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    Matt Casters Guest

    Default Re: JDBCKettle - contribute or standalone?

    They *should* throw stuff at you since you're just asking questions without a point or need.
    JDBC is just a standard data interface. Obviously it doesn't just expose the strengths of Kettle but also the weaknesses.

    >1) How robust it? Have you tested it against large volumes of data?


    It works just fine or it doesn't. Large volumes have nothing to do with it.

    >2) Compared to Hibernate or iBatis, where does it stand?


    It has no relationship with either. Hibernate is an Object Relational Mapping (ORM) API. JDBCKettle is a JDBC driver.

    >3) The transformation has to be executed first between the metadata can be
    > read. Though I understand why it's done in such manner, isn't it a major
    > point of failure? What if I get a Java heap error on my server and can't
    > start my transformation...


    So what? Increase memory?

    > 4) Say, I design a *slow* transformation (with Select Values, Switch...
    > steps) with Kettle. Given the nature of JDBC, isn't my code with JDBCKette
    > going to be slow too?


    Yes, a slow transformation is going to be slow. A fast transformation is going to be fast. (Duh!)
    Since the overhead of JDBC is low it doesn't make a difference in performance.

    Matt
    ____________________________________________
    Matt Casters
    Chief Data Integration - Kettle founder
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37

    On Sunday 01 February 2009 22:32:30 alan bonnemaison wrote:
    > A few questions from a newbie here.....
    >
    > 1) How robust it? Have you tested it against large volumes of data?
    > 2) Compared to Hibernate or iBatis, where does it stand?
    > 3) The transformation has to be executed first between the metadata can be
    > read. Though I understand why it's done in such manner, isn't it a major
    > point of failure? What if I get a Java heap error on my server and can't
    > start my transformation...
    > 4) Say, I design a *slow* transformation (with Select Values, Switch...
    > steps) with Kettle. Given the nature of JDBC, isn't my code with JDBCKette
    > going to be slow too?
    >
    > Going back to my corner before you start throwing stuff at me..........
    >
    > *Al.
    >
    > kg6ypx_at_gmail_dot_com*
    > *a.k.a.
    > **acbonnemaison on the Pentaho Forums.*
    >
    >
    > On Fri, Jan 30, 2009 at 9:11 PM, Nick Goodman <
    > ngoodman (AT) bayontechnologies (DOT) com> wrote:
    >
    > >
    > > Greetings Kettle Devs
    > >
    > > A few months ago my company began building an EII interface to
    > > Kettle. In particular, we were looking to allow Kettle
    > > transformations to be used in an EII way from more than just the
    > > Pentaho runtime "Get Data From" component. This is useful in Pentaho
    > > Report Designer along with other tools (Jasper/BIRT/etc).
    > >
    > > The interface we arrived at that would have far reaching
    > > compatibility was, no surprise, JDBC. Tom Qin and I have worked on
    > > this driver and have reach an initial version that is "baked enough"
    > > to use. You can see an example here: http://
    > > demo.bayontechnologies.com/pentaho/ViewAction?
    > > outputType=html&run_as_background=No&solution=samples&action=examplecsv.
    > > xaction&path=&userid=guest&password=guest
    > >
    > > The basic idea is to point the JDBC driver at a directory of KTRs,
    > > execute SQL like "select * from my_transformation.my_step" and get
    > > the results. The JDBC driver basically translates Kettle metadata
    > > (transforms, steps, fields, datatypes) into JDBC metadata (schemas,
    > > tables, columns, column types), parses SQL, starts the
    > > transformations and returns the in memory rows that are passed out of
    > > the step name.
    > >
    > > The project website is: http://code.google.com/p/jdbckettle/
    > >
    > > My question for the list is: is this useful enough to include in the
    > > Kettle project? If others would find it useful, are willing to help
    > > maintain it, etc we'd be happy to commit it to Kettle. If I hear
    > > crickets from this email I'll assume that we are the only ones that
    > > find this "interesting" and will continue to maintain it as a
    > > standalone project.
    > >
    > > Thoughts/Ideas?
    > >
    > > Nick
    > >
    > > >

    > >

    >
    > >

    >



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  4. #4
    alan bonnemaison Guest

    Default Re: JDBCKettle - contribute or standalone?

    Matt, If I offended you with my ignorance, I apologize.

    As I wrote in my post, I am a newbie. I thought I could learn a thing or two
    here. Obviously, I was wrong.

    On Mon, Feb 2, 2009 at 8:24 AM, Matt Casters <mattcasters (AT) gmail (DOT) com> wrote:

    >
    > They *should* throw stuff at you since you're just asking questions without
    > a point or need.
    > JDBC is just a standard data interface. Obviously it doesn't just expose
    > the strengths of Kettle but also the weaknesses.
    >
    >
    > >1) How robust it? Have you tested it against large volumes of data?

    >
    >
    > It works just fine or it doesn't. Large volumes have nothing to do with it.
    >
    >
    > >2) Compared to Hibernate or iBatis, where does it stand?

    >
    >
    > It has no relationship with either. Hibernate is an Object Relational
    > Mapping (ORM) API. JDBCKettle is a JDBC driver.
    >
    >
    > >3) The transformation has to be executed first between the metadata can be
    > > read. Though I understand why it's done in such manner, isn't it a major
    > > point of failure? What if I get a Java heap error on my server and can't
    > > start my transformation...

    >
    >
    > So what? Increase memory?
    >
    >
    > > 4) Say, I design a *slow* transformation (with Select Values, Switch...
    > > steps) with Kettle. Given the nature of JDBC, isn't my code with

    > JDBCKette
    > > going to be slow too?

    >
    >
    > Yes, a slow transformation is going to be slow. A fast transformation is
    > going to be fast. (Duh!)
    > Since the overhead of JDBC is low it doesn't make a difference in
    > performance.
    >
    >
    > Matt
    > ____________________________________________
    > Matt Casters
    > Chief Data Integration - Kettle founder
    > Pentaho, Open Source Business Intelligence
    > http://www.pentaho.org -- mcasters (AT) pentaho (DOT) org
    > Tel. +32 (0) 486 97 29 37
    >
    >
    > On Sunday 01 February 2009 22:32:30 alan bonnemaison wrote:
    > > A few questions from a newbie here.....
    > >
    > > 1) How robust it? Have you tested it against large volumes of data?
    > > 2) Compared to Hibernate or iBatis, where does it stand?
    > > 3) The transformation has to be executed first between the metadata can

    > be
    > > read. Though I understand why it's done in such manner, isn't it a major
    > > point of failure? What if I get a Java heap error on my server and can't
    > > start my transformation...
    > > 4) Say, I design a *slow* transformation (with Select Values, Switch...
    > > steps) with Kettle. Given the nature of JDBC, isn't my code with

    > JDBCKette
    > > going to be slow too?
    > >
    > > Going back to my corner before you start throwing stuff at me..........
    > >
    > > *Al.
    > >
    > > kg6ypx_at_gmail_dot_com*
    > > *a.k.a.
    > > **acbonnemaison on the Pentaho Forums.*
    > >
    > >
    > > On Fri, Jan 30, 2009 at 9:11 PM, Nick Goodman <
    > > ngoodman (AT) bayontechnologies (DOT) com> wrote:
    > >
    > > >
    > > > Greetings Kettle Devs
    > > >
    > > > A few months ago my company began building an EII interface to
    > > > Kettle. In particular, we were looking to allow Kettle
    > > > transformations to be used in an EII way from more than just the
    > > > Pentaho runtime "Get Data From" component. This is useful in Pentaho
    > > > Report Designer along with other tools (Jasper/BIRT/etc).
    > > >
    > > > The interface we arrived at that would have far reaching
    > > > compatibility was, no surprise, JDBC. Tom Qin and I have worked on
    > > > this driver and have reach an initial version that is "baked enough"
    > > > to use. You can see an example here: http://
    > > > demo.bayontechnologies.com/pentaho/ViewAction?
    > > >

    > outputType=html&run_as_background=No&solution=samples&action=examplecsv.
    > > > xaction&path=&userid=guest&password=guest
    > > >
    > > > The basic idea is to point the JDBC driver at a directory of KTRs,
    > > > execute SQL like "select * from my_transformation.my_step" and get
    > > > the results. The JDBC driver basically translates Kettle metadata
    > > > (transforms, steps, fields, datatypes) into JDBC metadata (schemas,
    > > > tables, columns, column types), parses SQL, starts the
    > > > transformations and returns the in memory rows that are passed out of
    > > > the step name.
    > > >
    > > > The project website is: http://code.google.com/p/jdbckettle/
    > > >
    > > > My question for the list is: is this useful enough to include in the
    > > > Kettle project? If others would find it useful, are willing to help
    > > > maintain it, etc we'd be happy to commit it to Kettle. If I hear
    > > > crickets from this email I'll assume that we are the only ones that
    > > > find this "interesting" and will continue to maintain it as a
    > > > standalone project.
    > > >
    > > > Thoughts/Ideas?
    > > >
    > > > Nick
    > > >
    > > > >
    > > >

    > >
    > > >

    > >

    >
    >
    >
    >
    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  5. #5
    DEinspanjer Guest

    Default Re: JDBCKettle - contribute or standalone?

    This looks very interesting. We've been fighting with some poor
    integration of Kettle into the Pentaho platform xactions, and I can
    see how this would be much more straightforward to implement (plus,
    being able to set variables at runtime! Woot!)
    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  6. #6
    DEinspanjer Guest

    Default Re: JDBCKettle - contribute or standalone?

    Alan,

    Don't worry about it. You learn stuff about Kettle when Matt replies,
    regardless of whether it is a yell or not. My first year with Kettle
    had Matt yelling at me quite a lot.

    One thing about point #3, I haven't looked at the actual code yet, but
    I suspect that it doesn't actually *execute* the transformation before
    it reads the metadata, it probably just loads and initializes the
    transformation.

    On Feb 2, 9:30

  7. #7
    Nicholas Goodman Guest

    Default Re: JDBCKettle - contribute or standalone?

    On Feb 2, 2009, at 7:49 AM, DEinspanjer wrote:

    > One thing about point #3, I haven't looked at the actual code yet, but
    > I suspect that it doesn't actually *execute* the transformation before
    > it reads the metadata, it probably just loads and initializes the
    > transformation.


    This is accurate. It loads the metadata and does a ".getFields()"
    which doesn't actually run the transformation. However, some JDBC
    drivers are still stupid enough at a request for metadata to actually
    run the query and get the metadata. I recently worked with a
    customer using the latest/greatest Oracle driver and was executing
    big long 30 minute queries to return metadata. Nothing we (kettle)
    can do about that, except cache those results which is what Kettle
    already does.

    Nick
    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  8. #8
    Matt Casters Guest

    Default Re: JDBCKettle - contribute or standalone?

    On Monday 02 February 2009 16:49:46 DEinspanjer wrote:
    > My first year with Kettle
    > had Matt yelling at me quite a lot.


    !!! SORRY ABOUT THAT !!!!

    Matt
    ____________________________________________
    Matt Casters
    Chief Data Integration - Kettle founder
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  9. #9
    Nicholas Goodman Guest

    Default Re: JDBCKettle - contribute or standalone?

    On Feb 1, 2009, at 1:32 PM, alan bonnemaison wrote:

    > 1) How robust it? Have you tested it against large volumes of data?

    I'd think pretty robust, since we're only using the RowListener
    interface of kettle. Ie, it should handle the volumes that Kettle
    does and that's quite big. I have *not* tested it against big sets
    of data. I personally don't see a use case for using the JDBC driver
    EII but like anything open source, there's always use cases the
    person writing didn't think of. Do you (or others) have a use case
    for integrating large volumes of data from multiple sources and
    receive the output via a big JDBC result set?
    >
    > 2) Compared to Hibernate or iBatis, where does it stand?

    Like Matt I'll say these two things are pretty much completely
    unrelated. Hibernate/iBatis is used by an OLTP application to
    store / retrieve individual data. EII JDBCKettle is used typically
    for reporting systems to assemble data from a few different places
    (Web Services, 3 databases and a flat file).
    >
    > 3) The transformation has to be executed first between the metadata
    > can be read. Though I understand why it's done in such manner,
    > isn't it a major point of failure? What if I get a Java heap error
    > on my server and can't start my transformation...

    You are somewhat correct here. Kettle / JDBCKettle are "in process"
    meaning that you'd add it to your application. It's not a separate
    server. If this was an issue for you, you could separate it with
    vJDBC so that the "server" would be separated from the client.
    That's not a bad thing; Matt identified this as the real end goal
    when I first spoke to him about this idea a few months back.
    >
    > 4) Say, I design a *slow* transformation (with Select Values,
    > Switch... steps) with Kettle. Given the nature of JDBC, isn't my
    > code with JDBCKette going to be slow too?

    If your transformation is slow, the JDBC statement will be slow as well.
    >
    > Going back to my corner before you start throwing stuff at
    > me..........


    I think you're thinking about JDBC drivers as 95% of the planet has
    come to use them - in regular old applications (CRM/ERP/custom).
    However, EII is a sort of special use case for multi data source data
    assembly, typically used for reports. The latency of startup, etc
    matter less in these circumstances.

    I'll throw a sheep at you. I don't facebook so this is the first
    time I've ever thrown a sheep at someone digitally.

    Nick



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  10. #10
    alan bonnemaison Guest

    Default Re: JDBCKettle - contribute or standalone?

    What got me confused is

    "When the JDBCDriver
    <http://code.google.com/p/jdbckettle/wiki/JDBCDriver>executes select
    Year, PresentsNickReceived from output the driver starts the transformation,
    executes it, and slurps the data on the "output" of the step used as the
    table name <http://code.google.com/p/jdbckettle/wiki/Schema>. "

    I took it literally. It made sense to me given what l know of JDBC (Oracle &
    DB2). Some drivers seem to be doing a better job than others when dealing
    with metadata but SQL queries have seemingly to be executed beforehand.


    On Mon, Feb 2, 2009 at 8:09 AM, Nicholas Goodman <
    ngoodman (AT) bayontechnologies (DOT) com> wrote:

    > On Feb 2, 2009, at 7:49 AM, DEinspanjer wrote:
    >
    > One thing about point #3, I haven't looked at the actual code yet, but
    >
    > I suspect that it doesn't actually *execute* the transformation before
    >
    > it reads the metadata, it probably just loads and initializes the
    >
    > transformation.
    >
    >
    > This is accurate. It loads the metadata and does a ".getFields()" which
    > doesn't actually run the transformation. However, some JDBC drivers are
    > still stupid enough at a request for metadata to actually run the query and
    > get the metadata. I recently worked with a customer using the
    > latest/greatest Oracle driver and was executing big long 30 minute queries
    > to return metadata. Nothing we (kettle) can do about that, except cache
    > those results which is what Kettle already does.
    >
    > Nick
    >
    >


    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.