Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Invoke Kettle Transformations embeded in other java programs

  1. #1

    Question Invoke Kettle Transformations embeded in other java programs

    Hi,

    I am evaluating different ETL tools to replace our home grown data integration suite. First let me say - great tool - thanks guys !


    1. I want to invoke kettle transformations as an API and embed in my application. How do I do it ?
    2. How do I feed output of transformations to another program? I would like to avoid writing to disk/DB and read back the results.
    3. Is there some guideline/samples to look at if I want to create some custom transformations

    Thanks in advance

    rusang

  2. #2

    Default one answer and same questions

    I'm working through a similar situation - I am evoking the job with an Pentaho xaction but also have questions about retuning the results (to an ESB tool) ... how is the return (a single record) most effciently handled. effciency is important as this must handle multiple records per second.

    Thanks

  3. #3
    Join Date
    May 2006
    Posts
    4,882

    Default

    For the first post I would suggest to have a look at pan on how to call transformations from java.

    About passing information through, you can also get at the result of a transformation, if you use the "copy rows to result" as last step in the transformation.
    For unit testcases we use RowCollectors but for that to work you need to know the name of the step from which you want to get the rows from (and it's all in memory).

    Regards,
    Sven
    Last edited by sboden; 09-20-2007 at 02:42 AM.

  4. #4
    Join Date
    Nov 1999
    Posts
    459

    Default Pentaho Data Integration - Java API Examples

    The Pentaho BI-Server Kettle Component shows a good example how to call a Transformation or a Job from another Java program.

    see this brand new page: http://wiki.pentaho.org/display/EAI/...a+API+Examples

    Cheers,
    Jens

  5. #5

    Default returning results

    RE:
    "About passing information through, you can also get at the result of a transformation, if you use the "copy rows to result" as last step in the transformation."

    How would I then access the return data from a different app?

    Thanks!

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    Quote Originally Posted by dogfuel View Post
    How would I then access the return data from a different app?
    Not from the different app, it assumes you run a transformation in your own java app and then you can get at the result.

    Regards,
    Sven

  7. #7

    Default

    So, I was able to embed and run transformations from my java program. Thanks everyone.


    2. About the second question - being able to get transformed row back in the java program:

    I actually want to do some more business specific processing on each row before persisting somewhere. I need want all transformed rows collected in the end. So, seems like I might need to convert my business specific transformations/validations to Steps. Any thoughts any one ?? I would like to defer converting all the business specific validations to Steps to a later stage/releases of my product.

  8. #8
    Join Date
    May 2006
    Posts
    4,882

    Default

    Steps would of course be the easiest from a GUI perspective, but as before if you want to (and you make your transformations in a specific way) you can get at the result rows.

    Regards,
    Sven

  9. #9

    Default

    I would like to get the result rows but I am concerned about the performance. How will get result rows work? Will it finish transforming all the rows and then get me all the rows in one shot ?

    I will have large data (500K to 1million rows) to process so speed is a major concern for me.
    I really liked the Kettle Architecture where each step runs in its own thread. However I need to figure out 2 key issues before I present migration to Kettle to my management - 1) Adding our own custom processing per row without losing much on performance 2) running Kettle transformations in an AppServer (multi-threading issue - I have created another thread for that).

  10. #10

    Default

    Let me clarify a bit more.
    GUI is least of my concern at this moment.
    After transformations are done a row I want to do additional business validation/transformation on each row before I persist it.
    Do when I get result rows do I get all the rows after transformation is complete or there is some way that my custom code can run on one row at a time. If I get result rows after all the steps in the kettle transformation are done for all the rows in the input then it will be a performance hit.

  11. #11
    Join Date
    May 2006
    Posts
    4,882

    Default

    On Trans there's a method Result that will among other return you all of the result rows... all rows in one time, but you can iterator over them of course. Try it, it's the only way you're going to find out for sure

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.