Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Developer documentation

  1. #1
    Cesar Martinez Guest

    Default Developer documentation

    Hello,

    I'm evaluating the possibility to write some custom transformations or
    jobs for Kettle.

    I've been able to download the source code from SVN and I've
    successfully compiled it.

    Now, I'd like to get some developer documentation, but I found very
    few information ([1], [2] and [3]). It would be very helpful if you
    can provide some links about the following topics:

    - General Kettle architecture
    - How to write your own jobs and transformations (relevant interfaces
    and methods), and how is data exchanged between them, at the API
    level.

    If such documentation does not exist, it would be enough to know the
    relevant interfaces and the name of some simple classes implementing
    them.

    Thanks in advance,

    C

  2. #2
    Vijay A Guest

    Default Re: Developer documentation

    Hi Cesar, Looking at the code and debugging of the existing
    transformations a good way gain an understanding of the control flow.
    You can start by exploring a few transformations in <Kettle
    Workspace>\src\org\pentaho\di\trans\steps

    You need to implement 4 classes as follows :
    (Taking the example of the JoinRows(Cartesian Product) Transformation

    public class JoinRows extends BaseStep implements StepInterface
    public class JoinRowsData extends BaseStepData implements StepDataInterface
    public class JoinRowsMeta extends BaseStepMeta implements StepMetaInterface
    public class JoinRowsDialog extends BaseStepDialog implements
    StepDialogInterface

    You can start a wiki page for the same and the community member will add to
    / refine that wiki.

    Thanks,
    Vijay

    On Wed, Jun 10, 2009 at 7:17 PM, Cesar Martinez <cesar.izq (AT) gmail (DOT) com> wrote:
    [color=blue]
    >
    > Hello,
    >
    > I'm evaluating the possibility to write some custom transformations or
    > jobs for Kettle.
    >
    > I've been able to download the source code from SVN and I've
    > successfully compiled it.
    >
    > Now, I'd like to get some developer documentation, but I found very
    > few information ([1], [2] and [3]). It would be very helpful if you
    > can provide some links about the following topics:
    >
    > - General Kettle architecture
    > - How to write your own jobs and transformations (relevant interfaces
    > and methods), and how is data exchanged between them, at the API
    > level.
    >
    > If such documentation does not exist, it would be enough to know the
    > relevant interfaces and the name of some simple classes implementing
    > them.
    >
    > Thanks in advance,
    >
    > C

  3. #3
    C Guest

    Default Re: Developer documentation

    Thanks for the info Vijay.

    I have been reading the existing development documentation, and I also
    had a look to the implementation of some Steps.

    I wrote some conclusions, and I've also got some concrete questions,
    that I paste bellow. I'd be happy if someone could confirm (or
    correct) my conclusions, and answer my questions:

    "Kettle offers two different ways for processing information:
    Transformations and Jobs.
    Transformations are formed by one or more Transformation Steps and
    only apply to tabular data, which can be processed row by row. In
    Transformations, several Steps can be processed at the same time
    (concurrent execution), because each row is processed independently of
    the others, and the result of each row flows from one step to the
    following one.
    Therefore, if we examine the estate of a transformation in a precise
    moment, the, the first step may be processing the 3th row, the second
    step may be processing the 2nd row, the third step may be processing
    the 1st row, and the four step may be waiting for a result to arrive
    from the third step.
    As each row is processed in an atomic way, rows may be processed in
    different computers, which is know as clustered execution.
    By contrast, Jobs are formed by one more Job Entries, which can
    perform any kind of task (they can apply to non-tabular data). Jobs
    Entries are executed in a sequential order, so a Job Entry is only
    executed when the previous one has totally finished. Therefore, Jobs
    don

  4. #4
    Jens Bleuel Guest

    Default Re: Developer documentation

    Hi C

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.