US and Worldwide: +1 (866) 660-7555
Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Spoon Jobs Plugins -- parallel paths

  1. #1
    Joe Chambers Guest

    Default Spoon Jobs Plugins -- parallel paths

    I am developing a set of plugins to interface with a new data
    platform. I've got it working in a linear fashion. However I want to
    run some of the tasks in parallel or multiple paths/threads. I see
    you can run multiple paths but rejoining them and having data passed
    to the merge step seems to be an issue. I am using the prevResult and
    returning the Result in the execute function to carry my data between
    steps. The problem the merge/join is just called by the thread that
    finishes first, is there a way to have some type of wait loop that I
    can merge the data from all the previous steps going into the merge
    step.

    I'm looking at using a static variable to enter a waiting loop that
    would block all other calls until all the data is available, each
    additional call to this step would, based on this static variable, go
    into a merge function that would merge its data into a static variable
    and then once the count has reached the number of paths continue.
    With this I need to know a way to write a split step that can some how
    detect the number of exiting paths, is this possible?

    There has to be a better way but I don't see a construct to do it.

    I know this doesn't quite fit in with Spoon's existing infrastructure
    but I've been tasked with doing this.

    Thanks,
    Joseph

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  2. #2
    Roland Bouman Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    I think this is not really a question for the dev group.

    Anyway, you can always synchronize by wrapping the job with the
    parallel paths in one job, and calling that in another job. The main
    job will continue only when the sub job is finished. This does mean
    you need to store any result data in some place (file, database) so
    you can pick it up later.

    On Mon, Oct 3, 2011 at 6:08 PM, Joe Chambers <joseph.chambers (AT) gmail (DOT) com> wrote:
    > I am developing a set of plugins to interface with a new data
    > platform. I've got it working in a linear fashion. However I want to
    > run some of the tasks in parallel or multiple paths/threads. I see
    > you can run multiple paths but rejoining them and having data passed
    > to the merge step seems to be an issue. I am using the prevResult and
    > returning the Result in the execute function to carry my data between
    > steps. The problem the merge/join is just called by the thread that
    > finishes first, is there a way to have some type of wait loop that I
    > can merge the data from all the previous steps going into the merge
    > step.
    >
    > I'm looking at using a static variable to enter a waiting loop that
    > would block all other calls until all the data is available, each
    > additional call to this step would, based on this static variable, go
    > into a merge function that would merge its data into a static variable
    > and then once the count has reached the number of paths continue.
    > With this I need to know a way to write a split step that can some how
    > detect the number of exiting paths, is this possible?
    >
    > There has to be a better way but I don't see a construct to do it.
    >
    > I know this doesn't quite fit in with Spoon's existing infrastructure
    > but I've been tasked with doing this.
    >
    > Thanks,
    > Joseph
    >
    > --
    > You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
    >
    >




    --
    Roland Bouman
    blog: http://rpbouman.blogspot.com/
    twitter: @rolandbouman

    Author of "Pentaho Solutions: Business Intelligence and Data
    Warehousing with Pentaho and MySQL",
    http://tinyurl.com/lvxa88 (Wiley, ISBN: 978-0-470-48432-6)

    Author of "Pentaho Kettle Solutions: Building Open Source ETL
    Solutions with Pentaho Data Integration",
    http://tinyurl.com/33r7a8m (Wiley, ISBN: 978-0-470-63517-9)

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  3. #3
    Matt Casters Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    Hi Joe,

    If you join different data streams, you can indeed use a step like Merge
    Join.
    However, if you want to simply merge the data from 2 or more copies of the
    same step you don't need to do anything as it's standard behavior of a step.

    In the case of job entries (not clear what you are building) it's indeed
    hard to have parallel entries add to the result row list.
    However, perhaps it would be more efficient to add the rows to a database
    staging table or another similar temporary container.

    Matt


    2011/10/3 Joe Chambers <joseph.chambers (AT) gmail (DOT) com>

    > I am developing a set of plugins to interface with a new data
    > platform. I've got it working in a linear fashion. However I want to
    > run some of the tasks in parallel or multiple paths/threads. I see
    > you can run multiple paths but rejoining them and having data passed
    > to the merge step seems to be an issue. I am using the prevResult and
    > returning the Result in the execute function to carry my data between
    > steps. The problem the merge/join is just called by the thread that
    > finishes first, is there a way to have some type of wait loop that I
    > can merge the data from all the previous steps going into the merge
    > step.
    >
    > I'm looking at using a static variable to enter a waiting loop that
    > would block all other calls until all the data is available, each
    > additional call to this step would, based on this static variable, go
    > into a merge function that would merge its data into a static variable
    > and then once the count has reached the number of paths continue.
    > With this I need to know a way to write a split step that can some how
    > detect the number of exiting paths, is this possible?
    >
    > There has to be a better way but I don't see a construct to do it.
    >
    > I know this doesn't quite fit in with Spoon's existing infrastructure
    > but I've been tasked with doing this.
    >
    > Thanks,
    > Joseph
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >
    >



    --
    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    (Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
    Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    Pentaho : The Commercial Open Source Alternative for Business Intelligence

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  4. #4
    Andy Grohe Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    Since we are asking the questions, I would normally say use "serialize to file" which keeps kettle data structures intact vs going out to files or db.

    @matt, curious why you suggest db vs the native kettle serialize inputs/outputs?

    Sent from my iPhone

    On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:

    > Hi Joe,
    >
    > If you join different data streams, you can indeed use a step like Merge Join.
    > However, if you want to simply merge the data from 2 or more copies of the same step you don't need to do anything as it's standard behavior of a step.
    >
    > In the case of job entries (not clear what you are building) it's indeed hard to have parallel entries add to the result row list.
    > However, perhaps it would be more efficient to add the rows to a database staging table or another similar temporary container.
    >
    > Matt
    >
    >
    > 2011/10/3 Joe Chambers <joseph.chambers (AT) gmail (DOT) com>
    > I am developing a set of plugins to interface with a new data
    > platform. I've got it working in a linear fashion. However I want to
    > run some of the tasks in parallel or multiple paths/threads. I see
    > you can run multiple paths but rejoining them and having data passed
    > to the merge step seems to be an issue. I am using the prevResult and
    > returning the Result in the execute function to carry my data between
    > steps. The problem the merge/join is just called by the thread that
    > finishes first, is there a way to have some type of wait loop that I
    > can merge the data from all the previous steps going into the merge
    > step.
    >
    > I'm looking at using a static variable to enter a waiting loop that
    > would block all other calls until all the data is available, each
    > additional call to this step would, based on this static variable, go
    > into a merge function that would merge its data into a static variable
    > and then once the count has reached the number of paths continue.
    > With this I need to know a way to write a split step that can some how
    > detect the number of exiting paths, is this possible?
    >
    > There has to be a better way but I don't see a construct to do it.
    >
    > I know this doesn't quite fit in with Spoon's existing infrastructure
    > but I've been tasked with doing this.
    >
    > Thanks,
    > Joseph
    >
    > --
    > You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
    >
    >
    >
    >
    > --
    > Matt Casters <mcasters (AT) pentaho (DOT) org>
    > Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)
    > Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    > Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >
    >
    > --
    > You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.


    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  5. #5
    Matt Casters Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    No special reason Andy, just old habits of a Kettle guy formerly known as
    DBA.

    2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>

    > Since we are asking the questions, I would normally say use "serialize to
    > file" which keeps kettle data structures intact vs going out to files or db.
    >
    > @matt, curious why you suggest db vs the native kettle serialize
    > inputs/outputs?
    >
    > Sent from my iPhone
    >
    > On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >
    > Hi Joe,
    >
    > If you join different data streams, you can indeed use a step like Merge
    > Join.
    > However, if you want to simply merge the data from 2 or more copies of the
    > same step you don't need to do anything as it's standard behavior of a step.
    >
    > In the case of job entries (not clear what you are building) it's indeed
    > hard to have parallel entries add to the result row list.
    > However, perhaps it would be more efficient to add the rows to a database
    > staging table or another similar temporary container.
    >
    > Matt
    >
    >
    > 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
    > joseph.chambers (AT) gmail (DOT) com>
    >
    >> I am developing a set of plugins to interface with a new data
    >> platform. I've got it working in a linear fashion. However I want to
    >> run some of the tasks in parallel or multiple paths/threads. I see
    >> you can run multiple paths but rejoining them and having data passed
    >> to the merge step seems to be an issue. I am using the prevResult and
    >> returning the Result in the execute function to carry my data between
    >> steps. The problem the merge/join is just called by the thread that
    >> finishes first, is there a way to have some type of wait loop that I
    >> can merge the data from all the previous steps going into the merge
    >> step.
    >>
    >> I'm looking at using a static variable to enter a waiting loop that
    >> would block all other calls until all the data is available, each
    >> additional call to this step would, based on this static variable, go
    >> into a merge function that would merge its data into a static variable
    >> and then once the count has reached the number of paths continue.
    >> With this I need to know a way to write a split step that can some how
    >> detect the number of exiting paths, is this possible?
    >>
    >> There has to be a better way but I don't see a construct to do it.
    >>
    >> I know this doesn't quite fit in with Spoon's existing infrastructure
    >> but I've been tasked with doing this.
    >>
    >> Thanks,
    >> Joseph
    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to <kettle-developers (AT) googlegroups (DOT) com>
    >> kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> <http://groups.google.com/group/kettle-developers?hl=en>
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>
    >>

    >
    >
    > --
    > Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
    > Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    > (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    > )
    > Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    > Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >




    --
    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    (Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
    Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    Pentaho : The Commercial Open Source Alternative for Business Intelligence

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  6. #6
    Joseph Chambers Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    Is there a group dedicated to developing plug-ins? I figured the
    Development board was for both the core and the development of plugins.

    Thanks for the suggestions, the plug-ins get out side of the typical use of
    Spoon as I understand it. What I'm doing in the multiple paths is splitting
    off and pre-processing (across a cluster of servers) multiple groups of data
    (this isn't a traditional database that I'm interfacing with). The
    pre-processing then returns proprietary code that I must have in later steps
    to utilize the the preprocessed data.

    From a programming point of view, if I have 3 paths going into one step with
    in the Job I assume only one object of the class is created. So if I use a
    variable to switch my logic I can merge the data together as it comes in
    until I've reached the number of paths and then continue.

    Is there a programmatic way in a plugin to detect the number of outgoing or
    inbound paths attached? I think I can handle the other issues but I don't
    want this value to be a user input or hard coded.


    On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:

    > No special reason Andy, just old habits of a Kettle guy formerly known as
    > DBA.
    >
    >
    > 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
    >
    >> Since we are asking the questions, I would normally say use "serialize to
    >> file" which keeps kettle data structures intact vs going out to files or db.
    >>
    >> @matt, curious why you suggest db vs the native kettle serialize
    >> inputs/outputs?
    >>
    >> Sent from my iPhone
    >>
    >> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >>
    >> Hi Joe,
    >>
    >> If you join different data streams, you can indeed use a step like Merge
    >> Join.
    >> However, if you want to simply merge the data from 2 or more copies of the
    >> same step you don't need to do anything as it's standard behavior of a step.
    >>
    >> In the case of job entries (not clear what you are building) it's indeed
    >> hard to have parallel entries add to the result row list.
    >> However, perhaps it would be more efficient to add the rows to a database
    >> staging table or another similar temporary container.
    >>
    >> Matt
    >>
    >>
    >> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
    >> joseph.chambers (AT) gmail (DOT) com>
    >>
    >>> I am developing a set of plugins to interface with a new data
    >>> platform. I've got it working in a linear fashion. However I want to
    >>> run some of the tasks in parallel or multiple paths/threads. I see
    >>> you can run multiple paths but rejoining them and having data passed
    >>> to the merge step seems to be an issue. I am using the prevResult and
    >>> returning the Result in the execute function to carry my data between
    >>> steps. The problem the merge/join is just called by the thread that
    >>> finishes first, is there a way to have some type of wait loop that I
    >>> can merge the data from all the previous steps going into the merge
    >>> step.
    >>>
    >>> I'm looking at using a static variable to enter a waiting loop that
    >>> would block all other calls until all the data is available, each
    >>> additional call to this step would, based on this static variable, go
    >>> into a merge function that would merge its data into a static variable
    >>> and then once the count has reached the number of paths continue.
    >>> With this I need to know a way to write a split step that can some how
    >>> detect the number of exiting paths, is this possible?
    >>>
    >>> There has to be a better way but I don't see a construct to do it.
    >>>
    >>> I know this doesn't quite fit in with Spoon's existing infrastructure
    >>> but I've been tasked with doing this.
    >>>
    >>> Thanks,
    >>> Joseph
    >>>
    >>> --
    >>> You received this message because you are subscribed to the Google Groups
    >>> "kettle-developers" group.
    >>> To post to this group, send email to
    >>> <kettle-developers (AT) googlegroups (DOT) com>kettle-developers (AT) googlegroups (DOT) com.
    >>> To unsubscribe from this group, send email to
    >>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
    >>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>> For more options, visit this group at
    >>> <http://groups.google.com/group/kettle-developers?hl=en>
    >>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>
    >>>

    >>
    >>
    >> --
    >> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
    >> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >> )
    >> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >> Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >>
    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>

    >
    >
    >
    > --
    > Matt Casters <mcasters (AT) pentaho (DOT) org>
    > Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    > (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    > )
    > Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    > Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >


    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  7. #7
    Matt Casters Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    I actually don't mind the questions about plugin development.

    Anyway, most people would write a step plugin for parallel work. All the
    questions you ask then have easy answers.

    Matt


    2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>

    > Is there a group dedicated to developing plug-ins? I figured the
    > Development board was for both the core and the development of plugins.
    >
    > Thanks for the suggestions, the plug-ins get out side of the typical use of
    > Spoon as I understand it. What I'm doing in the multiple paths is splitting
    > off and pre-processing (across a cluster of servers) multiple groups of data
    > (this isn't a traditional database that I'm interfacing with). The
    > pre-processing then returns proprietary code that I must have in later steps
    > to utilize the the preprocessed data.
    >
    > From a programming point of view, if I have 3 paths going into one step
    > with in the Job I assume only one object of the class is created. So if I
    > use a variable to switch my logic I can merge the data together as it comes
    > in until I've reached the number of paths and then continue.
    >
    > Is there a programmatic way in a plugin to detect the number of outgoing or
    > inbound paths attached? I think I can handle the other issues but I don't
    > want this value to be a user input or hard coded.
    >
    >
    >
    > On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
    >
    >> No special reason Andy, just old habits of a Kettle guy formerly known as
    >> DBA.
    >>
    >>
    >> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
    >>
    >>> Since we are asking the questions, I would normally say use "serialize to
    >>> file" which keeps kettle data structures intact vs going out to files or db.
    >>>
    >>> @matt, curious why you suggest db vs the native kettle serialize
    >>> inputs/outputs?
    >>>
    >>> Sent from my iPhone
    >>>
    >>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >>>
    >>> Hi Joe,
    >>>
    >>> If you join different data streams, you can indeed use a step like Merge
    >>> Join.
    >>> However, if you want to simply merge the data from 2 or more copies of
    >>> the same step you don't need to do anything as it's standard behavior of a
    >>> step.
    >>>
    >>> In the case of job entries (not clear what you are building) it's indeed
    >>> hard to have parallel entries add to the result row list.
    >>> However, perhaps it would be more efficient to add the rows to a database
    >>> staging table or another similar temporary container.
    >>>
    >>> Matt
    >>>
    >>>
    >>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
    >>> joseph.chambers (AT) gmail (DOT) com>
    >>>
    >>>> I am developing a set of plugins to interface with a new data
    >>>> platform. I've got it working in a linear fashion. However I want to
    >>>> run some of the tasks in parallel or multiple paths/threads. I see
    >>>> you can run multiple paths but rejoining them and having data passed
    >>>> to the merge step seems to be an issue. I am using the prevResult and
    >>>> returning the Result in the execute function to carry my data between
    >>>> steps. The problem the merge/join is just called by the thread that
    >>>> finishes first, is there a way to have some type of wait loop that I
    >>>> can merge the data from all the previous steps going into the merge
    >>>> step.
    >>>>
    >>>> I'm looking at using a static variable to enter a waiting loop that
    >>>> would block all other calls until all the data is available, each
    >>>> additional call to this step would, based on this static variable, go
    >>>> into a merge function that would merge its data into a static variable
    >>>> and then once the count has reached the number of paths continue.
    >>>> With this I need to know a way to write a split step that can some how
    >>>> detect the number of exiting paths, is this possible?
    >>>>
    >>>> There has to be a better way but I don't see a construct to do it.
    >>>>
    >>>> I know this doesn't quite fit in with Spoon's existing infrastructure
    >>>> but I've been tasked with doing this.
    >>>>
    >>>> Thanks,
    >>>> Joseph
    >>>>
    >>>> --
    >>>> You received this message because you are subscribed to the Google
    >>>> Groups "kettle-developers" group.
    >>>> To post to this group, send email to
    >>>> <kettle-developers (AT) googlegroups (DOT) com>kettle-developers (AT) googlegroups (DOT) com.
    >>>> To unsubscribe from this group, send email to
    >>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
    >>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>> For more options, visit this group at
    >>>> <http://groups.google.com/group/kettle-developers?hl=en>
    >>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>
    >>>>
    >>>
    >>>
    >>> --
    >>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
    >>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>> )
    >>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>> Pentaho : The Commercial Open Source Alternative for Business
    >>> Intelligence
    >>>
    >>>
    >>> --
    >>> You received this message because you are subscribed to the Google Groups
    >>> "kettle-developers" group.
    >>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >>> To unsubscribe from this group, send email to
    >>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>> For more options, visit this group at
    >>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>
    >>> --
    >>> You received this message because you are subscribed to the Google Groups
    >>> "kettle-developers" group.
    >>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >>> To unsubscribe from this group, send email to
    >>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>> For more options, visit this group at
    >>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>

    >>
    >>
    >>
    >> --
    >> Matt Casters <mcasters (AT) pentaho (DOT) org>
    >> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >> )
    >> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >> Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >>
    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>

    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >




    --
    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    (Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
    Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    Pentaho : The Commercial Open Source Alternative for Business Intelligence

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  8. #8
    Joseph Chambers Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    Yes we started initially using steps, but needed a little more flow
    control. Forgive me my newbe questions I am new to spoon, we may need to
    look back at steps (the lack of flow control might have been a knowledge
    issue on my part) but we need a way to do the majority of things in
    sequential order each step waiting for the next, but also split off into
    multiple paths when needed.

    If I can detect the number of inbound and outbound paths within the plugin I
    can handle what I need in the Jobs, once we have the Jobs going I will see
    if I can solve the flow issues we were having within the steps. My project
    manager had ran into those and told me to do the jobs plugins. I had
    suggested the "Wait on steps" to solve it but he wanted something with less
    user interaction.

    Also just curious on this is there a way to display data in a Job (open a
    window with the results in a table) when it finishes right now I am writing
    the data to a CSV file that I receive back from the server I'm calling. I
    know there is in Steps/Transformations, and I've thought about calling a
    Transformation from the Job to handle the display portion.




    On Mon, Oct 3, 2011 at 1:49 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:

    > I actually don't mind the questions about plugin development.
    >
    > Anyway, most people would write a step plugin for parallel work. All the
    > questions you ask then have easy answers.
    >
    > Matt
    >
    >
    > 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
    >
    >> Is there a group dedicated to developing plug-ins? I figured the
    >> Development board was for both the core and the development of plugins.
    >>
    >> Thanks for the suggestions, the plug-ins get out side of the typical use
    >> of Spoon as I understand it. What I'm doing in the multiple paths is
    >> splitting off and pre-processing (across a cluster of servers) multiple
    >> groups of data (this isn't a traditional database that I'm interfacing
    >> with). The pre-processing then returns proprietary code that I must have in
    >> later steps to utilize the the preprocessed data.
    >>
    >> From a programming point of view, if I have 3 paths going into one step
    >> with in the Job I assume only one object of the class is created. So if I
    >> use a variable to switch my logic I can merge the data together as it comes
    >> in until I've reached the number of paths and then continue.
    >>
    >> Is there a programmatic way in a plugin to detect the number of outgoing
    >> or inbound paths attached? I think I can handle the other issues but I
    >> don't want this value to be a user input or hard coded.
    >>
    >>
    >>
    >> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
    >>
    >>> No special reason Andy, just old habits of a Kettle guy formerly known as
    >>> DBA.
    >>>
    >>>
    >>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
    >>>
    >>>> Since we are asking the questions, I would normally say use "serialize
    >>>> to file" which keeps kettle data structures intact vs going out to files or
    >>>> db.
    >>>>
    >>>> @matt, curious why you suggest db vs the native kettle serialize
    >>>> inputs/outputs?
    >>>>
    >>>> Sent from my iPhone
    >>>>
    >>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >>>>
    >>>> Hi Joe,
    >>>>
    >>>> If you join different data streams, you can indeed use a step like Merge
    >>>> Join.
    >>>> However, if you want to simply merge the data from 2 or more copies of
    >>>> the same step you don't need to do anything as it's standard behavior of a
    >>>> step.
    >>>>
    >>>> In the case of job entries (not clear what you are building) it's indeed
    >>>> hard to have parallel entries add to the result row list.
    >>>> However, perhaps it would be more efficient to add the rows to a
    >>>> database staging table or another similar temporary container.
    >>>>
    >>>> Matt
    >>>>
    >>>>
    >>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
    >>>> joseph.chambers (AT) gmail (DOT) com>
    >>>>
    >>>>> I am developing a set of plugins to interface with a new data
    >>>>> platform. I've got it working in a linear fashion. However I want to
    >>>>> run some of the tasks in parallel or multiple paths/threads. I see
    >>>>> you can run multiple paths but rejoining them and having data passed
    >>>>> to the merge step seems to be an issue. I am using the prevResult and
    >>>>> returning the Result in the execute function to carry my data between
    >>>>> steps. The problem the merge/join is just called by the thread that
    >>>>> finishes first, is there a way to have some type of wait loop that I
    >>>>> can merge the data from all the previous steps going into the merge
    >>>>> step.
    >>>>>
    >>>>> I'm looking at using a static variable to enter a waiting loop that
    >>>>> would block all other calls until all the data is available, each
    >>>>> additional call to this step would, based on this static variable, go
    >>>>> into a merge function that would merge its data into a static variable
    >>>>> and then once the count has reached the number of paths continue.
    >>>>> With this I need to know a way to write a split step that can some how
    >>>>> detect the number of exiting paths, is this possible?
    >>>>>
    >>>>> There has to be a better way but I don't see a construct to do it.
    >>>>>
    >>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
    >>>>> but I've been tasked with doing this.
    >>>>>
    >>>>> Thanks,
    >>>>> Joseph
    >>>>>
    >>>>> --
    >>>>> You received this message because you are subscribed to the Google
    >>>>> Groups "kettle-developers" group.
    >>>>> To post to this group, send email to
    >>>>> <kettle-developers (AT) googlegroups (DOT) com>kettle-developers (AT) googlegroups (DOT) com
    >>>>> .
    >>>>> To unsubscribe from this group, send email to
    >>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
    >>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>> For more options, visit this group at
    >>>>> <http://groups.google.com/group/kettle-developers?hl=en>
    >>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>
    >>>>>
    >>>>
    >>>>
    >>>> --
    >>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
    >>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>>> )
    >>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>>> Pentaho : The Commercial Open Source Alternative for Business
    >>>> Intelligence
    >>>>
    >>>>
    >>>> --
    >>>> You received this message because you are subscribed to the Google
    >>>> Groups "kettle-developers" group.
    >>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    >>>> .
    >>>> To unsubscribe from this group, send email to
    >>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>> For more options, visit this group at
    >>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>
    >>>> --
    >>>> You received this message because you are subscribed to the Google
    >>>> Groups "kettle-developers" group.
    >>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    >>>> .
    >>>> To unsubscribe from this group, send email to
    >>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>> For more options, visit this group at
    >>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>
    >>>
    >>>
    >>>
    >>> --
    >>> Matt Casters <mcasters (AT) pentaho (DOT) org>
    >>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>> )
    >>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>> Pentaho : The Commercial Open Source Alternative for Business
    >>> Intelligence
    >>>
    >>>
    >>> --
    >>> You received this message because you are subscribed to the Google Groups
    >>> "kettle-developers" group.
    >>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >>> To unsubscribe from this group, send email to
    >>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>> For more options, visit this group at
    >>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>

    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>

    >
    >
    >
    > --
    > Matt Casters <mcasters (AT) pentaho (DOT) org>
    > Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    > (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    > )
    > Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    > Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >


    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  9. #9
    Matt Casters Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    Actually, we just added a "Job Executor" step in 4.3.0-M1 so the
    possibilities have increased a bit.

    As a general piece of advice, non-specific to Kettle: don't try to do
    everything in one transformation or job. Make things modular to keep a nice
    overview.
    Think about the idea of staging the data into a buffer (file) or queue
    (database table). Then you can scale as far as you like, for example like
    Diethard documented a while back:
    http://diethardsteiner.blogspot.com/...designing.html

    Matt


    2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>

    > Yes we started initially using steps, but needed a little more flow
    > control. Forgive me my newbe questions I am new to spoon, we may need to
    > look back at steps (the lack of flow control might have been a knowledge
    > issue on my part) but we need a way to do the majority of things in
    > sequential order each step waiting for the next, but also split off into
    > multiple paths when needed.
    >
    > If I can detect the number of inbound and outbound paths within the plugin
    > I can handle what I need in the Jobs, once we have the Jobs going I will see
    > if I can solve the flow issues we were having within the steps. My project
    > manager had ran into those and told me to do the jobs plugins. I had
    > suggested the "Wait on steps" to solve it but he wanted something with less
    > user interaction.
    >
    > Also just curious on this is there a way to display data in a Job (open a
    > window with the results in a table) when it finishes right now I am writing
    > the data to a CSV file that I receive back from the server I'm calling. I
    > know there is in Steps/Transformations, and I've thought about calling a
    > Transformation from the Job to handle the display portion.
    >
    >
    >
    >
    >
    > On Mon, Oct 3, 2011 at 1:49 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >
    >> I actually don't mind the questions about plugin development.
    >>
    >> Anyway, most people would write a step plugin for parallel work. All the
    >> questions you ask then have easy answers.
    >>
    >> Matt
    >>
    >>
    >> 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
    >>
    >>> Is there a group dedicated to developing plug-ins? I figured the
    >>> Development board was for both the core and the development of plugins.
    >>>
    >>> Thanks for the suggestions, the plug-ins get out side of the typical use
    >>> of Spoon as I understand it. What I'm doing in the multiple paths is
    >>> splitting off and pre-processing (across a cluster of servers) multiple
    >>> groups of data (this isn't a traditional database that I'm interfacing
    >>> with). The pre-processing then returns proprietary code that I must have in
    >>> later steps to utilize the the preprocessed data.
    >>>
    >>> From a programming point of view, if I have 3 paths going into one step
    >>> with in the Job I assume only one object of the class is created. So if I
    >>> use a variable to switch my logic I can merge the data together as it comes
    >>> in until I've reached the number of paths and then continue.
    >>>
    >>> Is there a programmatic way in a plugin to detect the number of outgoing
    >>> or inbound paths attached? I think I can handle the other issues but I
    >>> don't want this value to be a user input or hard coded.
    >>>
    >>>
    >>>
    >>> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
    >>>
    >>>> No special reason Andy, just old habits of a Kettle guy formerly known
    >>>> as DBA.
    >>>>
    >>>>
    >>>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
    >>>>
    >>>>> Since we are asking the questions, I would normally say use "serialize
    >>>>> to file" which keeps kettle data structures intact vs going out to files or
    >>>>> db.
    >>>>>
    >>>>> @matt, curious why you suggest db vs the native kettle serialize
    >>>>> inputs/outputs?
    >>>>>
    >>>>> Sent from my iPhone
    >>>>>
    >>>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org>
    >>>>> wrote:
    >>>>>
    >>>>> Hi Joe,
    >>>>>
    >>>>> If you join different data streams, you can indeed use a step like
    >>>>> Merge Join.
    >>>>> However, if you want to simply merge the data from 2 or more copies of
    >>>>> the same step you don't need to do anything as it's standard behavior of a
    >>>>> step.
    >>>>>
    >>>>> In the case of job entries (not clear what you are building) it's
    >>>>> indeed hard to have parallel entries add to the result row list.
    >>>>> However, perhaps it would be more efficient to add the rows to a
    >>>>> database staging table or another similar temporary container.
    >>>>>
    >>>>> Matt
    >>>>>
    >>>>>
    >>>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
    >>>>> joseph.chambers (AT) gmail (DOT) com>
    >>>>>
    >>>>>> I am developing a set of plugins to interface with a new data
    >>>>>> platform. I've got it working in a linear fashion. However I want to
    >>>>>> run some of the tasks in parallel or multiple paths/threads. I see
    >>>>>> you can run multiple paths but rejoining them and having data passed
    >>>>>> to the merge step seems to be an issue. I am using the prevResult and
    >>>>>> returning the Result in the execute function to carry my data between
    >>>>>> steps. The problem the merge/join is just called by the thread that
    >>>>>> finishes first, is there a way to have some type of wait loop that I
    >>>>>> can merge the data from all the previous steps going into the merge
    >>>>>> step.
    >>>>>>
    >>>>>> I'm looking at using a static variable to enter a waiting loop that
    >>>>>> would block all other calls until all the data is available, each
    >>>>>> additional call to this step would, based on this static variable, go
    >>>>>> into a merge function that would merge its data into a static variable
    >>>>>> and then once the count has reached the number of paths continue.
    >>>>>> With this I need to know a way to write a split step that can some how
    >>>>>> detect the number of exiting paths, is this possible?
    >>>>>>
    >>>>>> There has to be a better way but I don't see a construct to do it.
    >>>>>>
    >>>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
    >>>>>> but I've been tasked with doing this.
    >>>>>>
    >>>>>> Thanks,
    >>>>>> Joseph
    >>>>>>
    >>>>>> --
    >>>>>> You received this message because you are subscribed to the Google
    >>>>>> Groups "kettle-developers" group.
    >>>>>> To post to this group, send email to
    >>>>>> <kettle-developers (AT) googlegroups (DOT) com>
    >>>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>>> To unsubscribe from this group, send email to
    >>>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
    >>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>>> For more options, visit this group at
    >>>>>> <http://groups.google.com/group/kettle-developers?hl=en>
    >>>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>>
    >>>>>>
    >>>>>
    >>>>>
    >>>>> --
    >>>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
    >>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>>>> )
    >>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>>>> Pentaho : The Commercial Open Source Alternative for Business
    >>>>> Intelligence
    >>>>>
    >>>>>
    >>>>> --
    >>>>> You received this message because you are subscribed to the Google
    >>>>> Groups "kettle-developers" group.
    >>>>> To post to this group, send email to
    >>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>> To unsubscribe from this group, send email to
    >>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>> For more options, visit this group at
    >>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>
    >>>>> --
    >>>>> You received this message because you are subscribed to the Google
    >>>>> Groups "kettle-developers" group.
    >>>>> To post to this group, send email to
    >>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>> To unsubscribe from this group, send email to
    >>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>> For more options, visit this group at
    >>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>
    >>>>
    >>>>
    >>>>
    >>>> --
    >>>> Matt Casters <mcasters (AT) pentaho (DOT) org>
    >>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>>> )
    >>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>>> Pentaho : The Commercial Open Source Alternative for Business
    >>>> Intelligence
    >>>>
    >>>>
    >>>> --
    >>>> You received this message because you are subscribed to the Google
    >>>> Groups "kettle-developers" group.
    >>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    >>>> .
    >>>> To unsubscribe from this group, send email to
    >>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>> For more options, visit this group at
    >>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>
    >>>
    >>> --
    >>> You received this message because you are subscribed to the Google Groups
    >>> "kettle-developers" group.
    >>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >>> To unsubscribe from this group, send email to
    >>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>> For more options, visit this group at
    >>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>

    >>
    >>
    >>
    >> --
    >> Matt Casters <mcasters (AT) pentaho (DOT) org>
    >> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >> )
    >> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >> Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >>
    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>

    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >




    --
    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    (Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
    Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    Pentaho : The Commercial Open Source Alternative for Business Intelligence

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  10. #10
    Joseph Chambers Guest

    Default Re: Spoon Jobs Plugins -- parallel paths

    Agree, will refactor once I get all the pieces working I need.

    Is there some place I can look to see the function definitions of the Result
    class?


    On Mon, Oct 3, 2011 at 2:44 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:

    > Actually, we just added a "Job Executor" step in 4.3.0-M1 so the
    > possibilities have increased a bit.
    >
    > As a general piece of advice, non-specific to Kettle: don't try to do
    > everything in one transformation or job. Make things modular to keep a nice
    > overview.
    > Think about the idea of staging the data into a buffer (file) or queue
    > (database table). Then you can scale as far as you like, for example like
    > Diethard documented a while back:
    > http://diethardsteiner.blogspot.com/...designing.html
    >
    > Matt
    >
    >
    > 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
    >
    >> Yes we started initially using steps, but needed a little more flow
    >> control. Forgive me my newbe questions I am new to spoon, we may need to
    >> look back at steps (the lack of flow control might have been a knowledge
    >> issue on my part) but we need a way to do the majority of things in
    >> sequential order each step waiting for the next, but also split off into
    >> multiple paths when needed.
    >>
    >> If I can detect the number of inbound and outbound paths within the plugin
    >> I can handle what I need in the Jobs, once we have the Jobs going I will see
    >> if I can solve the flow issues we were having within the steps. My project
    >> manager had ran into those and told me to do the jobs plugins. I had
    >> suggested the "Wait on steps" to solve it but he wanted something with less
    >> user interaction.
    >>
    >> Also just curious on this is there a way to display data in a Job (open a
    >> window with the results in a table) when it finishes right now I am writing
    >> the data to a CSV file that I receive back from the server I'm calling. I
    >> know there is in Steps/Transformations, and I've thought about calling a
    >> Transformation from the Job to handle the display portion.
    >>
    >>
    >>
    >>
    >>
    >> On Mon, Oct 3, 2011 at 1:49 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
    >>
    >>> I actually don't mind the questions about plugin development.
    >>>
    >>> Anyway, most people would write a step plugin for parallel work. All the
    >>> questions you ask then have easy answers.
    >>>
    >>> Matt
    >>>
    >>>
    >>> 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
    >>>
    >>>> Is there a group dedicated to developing plug-ins? I figured the
    >>>> Development board was for both the core and the development of plugins.
    >>>>
    >>>> Thanks for the suggestions, the plug-ins get out side of the typical use
    >>>> of Spoon as I understand it. What I'm doing in the multiple paths is
    >>>> splitting off and pre-processing (across a cluster of servers) multiple
    >>>> groups of data (this isn't a traditional database that I'm interfacing
    >>>> with). The pre-processing then returns proprietary code that I must have in
    >>>> later steps to utilize the the preprocessed data.
    >>>>
    >>>> From a programming point of view, if I have 3 paths going into one step
    >>>> with in the Job I assume only one object of the class is created. So if I
    >>>> use a variable to switch my logic I can merge the data together as it comes
    >>>> in until I've reached the number of paths and then continue.
    >>>>
    >>>> Is there a programmatic way in a plugin to detect the number of outgoing
    >>>> or inbound paths attached? I think I can handle the other issues but I
    >>>> don't want this value to be a user input or hard coded.
    >>>>
    >>>>
    >>>>
    >>>> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
    >>>>
    >>>>> No special reason Andy, just old habits of a Kettle guy formerly known
    >>>>> as DBA.
    >>>>>
    >>>>>
    >>>>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
    >>>>>
    >>>>>> Since we are asking the questions, I would normally say use "serialize
    >>>>>> to file" which keeps kettle data structures intact vs going out to files or
    >>>>>> db.
    >>>>>>
    >>>>>> @matt, curious why you suggest db vs the native kettle serialize
    >>>>>> inputs/outputs?
    >>>>>>
    >>>>>> Sent from my iPhone
    >>>>>>
    >>>>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org>
    >>>>>> wrote:
    >>>>>>
    >>>>>> Hi Joe,
    >>>>>>
    >>>>>> If you join different data streams, you can indeed use a step like
    >>>>>> Merge Join.
    >>>>>> However, if you want to simply merge the data from 2 or more copies of
    >>>>>> the same step you don't need to do anything as it's standard behavior of a
    >>>>>> step.
    >>>>>>
    >>>>>> In the case of job entries (not clear what you are building) it's
    >>>>>> indeed hard to have parallel entries add to the result row list.
    >>>>>> However, perhaps it would be more efficient to add the rows to a
    >>>>>> database staging table or another similar temporary container.
    >>>>>>
    >>>>>> Matt
    >>>>>>
    >>>>>>
    >>>>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
    >>>>>> joseph.chambers (AT) gmail (DOT) com>
    >>>>>>
    >>>>>>> I am developing a set of plugins to interface with a new data
    >>>>>>> platform. I've got it working in a linear fashion. However I want
    >>>>>>> to
    >>>>>>> run some of the tasks in parallel or multiple paths/threads. I see
    >>>>>>> you can run multiple paths but rejoining them and having data passed
    >>>>>>> to the merge step seems to be an issue. I am using the prevResult
    >>>>>>> and
    >>>>>>> returning the Result in the execute function to carry my data between
    >>>>>>> steps. The problem the merge/join is just called by the thread that
    >>>>>>> finishes first, is there a way to have some type of wait loop that I
    >>>>>>> can merge the data from all the previous steps going into the merge
    >>>>>>> step.
    >>>>>>>
    >>>>>>> I'm looking at using a static variable to enter a waiting loop that
    >>>>>>> would block all other calls until all the data is available, each
    >>>>>>> additional call to this step would, based on this static variable, go
    >>>>>>> into a merge function that would merge its data into a static
    >>>>>>> variable
    >>>>>>> and then once the count has reached the number of paths continue.
    >>>>>>> With this I need to know a way to write a split step that can some
    >>>>>>> how
    >>>>>>> detect the number of exiting paths, is this possible?
    >>>>>>>
    >>>>>>> There has to be a better way but I don't see a construct to do it.
    >>>>>>>
    >>>>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
    >>>>>>> but I've been tasked with doing this.
    >>>>>>>
    >>>>>>> Thanks,
    >>>>>>> Joseph
    >>>>>>>
    >>>>>>> --
    >>>>>>> You received this message because you are subscribed to the Google
    >>>>>>> Groups "kettle-developers" group.
    >>>>>>> To post to this group, send email to
    >>>>>>> <kettle-developers (AT) googlegroups (DOT) com>
    >>>>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>>>> To unsubscribe from this group, send email to
    >>>>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
    >>>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>>>> For more options, visit this group at
    >>>>>>> <http://groups.google.com/group/kettle-developers?hl=en>
    >>>>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>>>
    >>>>>>>
    >>>>>>
    >>>>>>
    >>>>>> --
    >>>>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
    >>>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>>>>> )
    >>>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>>>>> Pentaho : The Commercial Open Source Alternative for Business
    >>>>>> Intelligence
    >>>>>>
    >>>>>>
    >>>>>> --
    >>>>>> You received this message because you are subscribed to the Google
    >>>>>> Groups "kettle-developers" group.
    >>>>>> To post to this group, send email to
    >>>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>>> To unsubscribe from this group, send email to
    >>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>>> For more options, visit this group at
    >>>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>>
    >>>>>> --
    >>>>>> You received this message because you are subscribed to the Google
    >>>>>> Groups "kettle-developers" group.
    >>>>>> To post to this group, send email to
    >>>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>>> To unsubscribe from this group, send email to
    >>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>>> For more options, visit this group at
    >>>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>>> --
    >>>>> Matt Casters <mcasters (AT) pentaho (DOT) org>
    >>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>>>> )
    >>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>>>> Pentaho : The Commercial Open Source Alternative for Business
    >>>>> Intelligence
    >>>>>
    >>>>>
    >>>>> --
    >>>>> You received this message because you are subscribed to the Google
    >>>>> Groups "kettle-developers" group.
    >>>>> To post to this group, send email to
    >>>>> kettle-developers (AT) googlegroups (DOT) com.
    >>>>> To unsubscribe from this group, send email to
    >>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>>> For more options, visit this group at
    >>>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>>
    >>>>
    >>>> --
    >>>> You received this message because you are subscribed to the Google
    >>>> Groups "kettle-developers" group.
    >>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    >>>> .
    >>>> To unsubscribe from this group, send email to
    >>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>>> For more options, visit this group at
    >>>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>>
    >>>
    >>>
    >>>
    >>> --
    >>> Matt Casters <mcasters (AT) pentaho (DOT) org>
    >>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    >>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    >>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    >>> )
    >>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    >>> Pentaho : The Commercial Open Source Alternative for Business
    >>> Intelligence
    >>>
    >>>
    >>> --
    >>> You received this message because you are subscribed to the Google Groups
    >>> "kettle-developers" group.
    >>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >>> To unsubscribe from this group, send email to
    >>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >>> For more options, visit this group at
    >>> http://groups.google.com/group/kettle-developers?hl=en.
    >>>

    >>
    >> --
    >> You received this message because you are subscribed to the Google Groups
    >> "kettle-developers" group.
    >> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    >> To unsubscribe from this group, send email to
    >> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    >> For more options, visit this group at
    >> http://groups.google.com/group/kettle-developers?hl=en.
    >>

    >
    >
    >
    > --
    > Matt Casters <mcasters (AT) pentaho (DOT) org>
    > Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    > (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
    > )
    > Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    > Pentaho : The Commercial Open Source Alternative for Business Intelligence
    >
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >


    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •