-
Spoon Jobs Plugins -- parallel paths
I am developing a set of plugins to interface with a new data
platform. I've got it working in a linear fashion. However I want to
run some of the tasks in parallel or multiple paths/threads. I see
you can run multiple paths but rejoining them and having data passed
to the merge step seems to be an issue. I am using the prevResult and
returning the Result in the execute function to carry my data between
steps. The problem the merge/join is just called by the thread that
finishes first, is there a way to have some type of wait loop that I
can merge the data from all the previous steps going into the merge
step.
I'm looking at using a static variable to enter a waiting loop that
would block all other calls until all the data is available, each
additional call to this step would, based on this static variable, go
into a merge function that would merge its data into a static variable
and then once the count has reached the number of paths continue.
With this I need to know a way to write a split step that can some how
detect the number of exiting paths, is this possible?
There has to be a better way but I don't see a construct to do it.
I know this doesn't quite fit in with Spoon's existing infrastructure
but I've been tasked with doing this.
Thanks,
Joseph
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
I think this is not really a question for the dev group.
Anyway, you can always synchronize by wrapping the job with the
parallel paths in one job, and calling that in another job. The main
job will continue only when the sub job is finished. This does mean
you need to store any result data in some place (file, database) so
you can pick it up later.
On Mon, Oct 3, 2011 at 6:08 PM, Joe Chambers <joseph.chambers (AT) gmail (DOT) com> wrote:
> I am developing a set of plugins to interface with a new data
> platform. 營've got it working in a linear fashion. 燞owever I want to
> run some of the tasks in parallel or multiple paths/threads. 營 see
> you can run multiple paths but rejoining them and having data passed
> to the merge step seems to be an issue. 營 am using the prevResult and
> returning the Result in the execute function to carry my data between
> steps. 燭he problem the merge/join is just called by the thread that
> finishes first, is there a way to have some type of wait loop that I
> can merge the data from all the previous steps going into the merge
> step.
>
> I'm looking at using a static variable to enter a waiting loop that
> would block all other calls until all the data is available, each
> additional call to this step would, based on this static variable, go
> into a merge function that would merge its data into a static variable
> and then once the count has reached the number of paths continue.
> With this I need to know a way to write a split step that can some how
> detect the number of exiting paths, is this possible?
>
> There has to be a better way but I don't see a construct to do it.
>
> I know this doesn't quite fit in with Spoon's existing infrastructure
> but I've been tasked with doing this.
>
> Thanks,
> Joseph
>
> --
> You received this message because you are subscribed to the Google Groups "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
>
>
--
Roland Bouman
blog: http://rpbouman.blogspot.com/
twitter: @rolandbouman
Author of "Pentaho Solutions: Business Intelligence and Data
Warehousing with Pentaho and MySQL",
http://tinyurl.com/lvxa88 (Wiley, ISBN: 978-0-470-48432-6)
Author of "Pentaho Kettle Solutions: Building Open Source ETL
Solutions with Pentaho Data Integration",
http://tinyurl.com/33r7a8m (Wiley, ISBN: 978-0-470-63517-9)
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
Hi Joe,
If you join different data streams, you can indeed use a step like Merge
Join.
However, if you want to simply merge the data from 2 or more copies of the
same step you don't need to do anything as it's standard behavior of a step.
In the case of job entries (not clear what you are building) it's indeed
hard to have parallel entries add to the result row list.
However, perhaps it would be more efficient to add the rows to a database
staging table or another similar temporary container.
Matt
2011/10/3 Joe Chambers <joseph.chambers (AT) gmail (DOT) com>
> I am developing a set of plugins to interface with a new data
> platform. I've got it working in a linear fashion. However I want to
> run some of the tasks in parallel or multiple paths/threads. I see
> you can run multiple paths but rejoining them and having data passed
> to the merge step seems to be an issue. I am using the prevResult and
> returning the Result in the execute function to carry my data between
> steps. The problem the merge/join is just called by the thread that
> finishes first, is there a way to have some type of wait loop that I
> can merge the data from all the previous steps going into the merge
> step.
>
> I'm looking at using a static variable to enter a waiting loop that
> would block all other calls until all the data is available, each
> additional call to this step would, based on this static variable, go
> into a merge function that would merge its data into a static variable
> and then once the count has reached the number of paths continue.
> With this I need to know a way to write a split step that can some how
> detect the number of exiting paths, is this possible?
>
> There has to be a better way but I don't see a construct to do it.
>
> I know this doesn't quite fit in with Spoon's existing infrastructure
> but I've been tasked with doing this.
>
> Thanks,
> Joseph
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
>
--
Matt Casters <mcasters (AT) pentaho (DOT) org>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle
Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
(Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho : The Commercial Open Source Alternative for Business Intelligence
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
Since we are asking the questions, I would normally say use "serialize to file" which keeps kettle data structures intact vs going out to files or db.
@matt, curious why you suggest db vs the native kettle serialize inputs/outputs?
Sent from my iPhone
On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
> Hi Joe,
>
> If you join different data streams, you can indeed use a step like Merge Join.
> However, if you want to simply merge the data from 2 or more copies of the same step you don't need to do anything as it's standard behavior of a step.
>
> In the case of job entries (not clear what you are building) it's indeed hard to have parallel entries add to the result row list.
> However, perhaps it would be more efficient to add the rows to a database staging table or another similar temporary container.
>
> Matt
>
>
> 2011/10/3 Joe Chambers <joseph.chambers (AT) gmail (DOT) com>
> I am developing a set of plugins to interface with a new data
> platform. I've got it working in a linear fashion. However I want to
> run some of the tasks in parallel or multiple paths/threads. I see
> you can run multiple paths but rejoining them and having data passed
> to the merge step seems to be an issue. I am using the prevResult and
> returning the Result in the execute function to carry my data between
> steps. The problem the merge/join is just called by the thread that
> finishes first, is there a way to have some type of wait loop that I
> can merge the data from all the previous steps going into the merge
> step.
>
> I'm looking at using a static variable to enter a waiting loop that
> would block all other calls until all the data is available, each
> additional call to this step would, based on this static variable, go
> into a merge function that would merge its data into a static variable
> and then once the count has reached the number of paths continue.
> With this I need to know a way to write a split step that can some how
> detect the number of exiting paths, is this possible?
>
> There has to be a better way but I don't see a construct to do it.
>
> I know this doesn't quite fit in with Spoon's existing infrastructure
> but I've been tasked with doing this.
>
> Thanks,
> Joseph
>
> --
> You received this message because you are subscribed to the Google Groups "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
>
>
>
>
> --
> Matt Casters <mcasters (AT) pentaho (DOT) org>
> Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)
> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>
>
> --
> You received this message because you are subscribed to the Google Groups "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
No special reason Andy, just old habits of a Kettle guy formerly known as
DBA.
2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
> Since we are asking the questions, I would normally say use "serialize to
> file" which keeps kettle data structures intact vs going out to files or db.
>
> @matt, curious why you suggest db vs the native kettle serialize
> inputs/outputs?
>
> Sent from my iPhone
>
> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
>
> Hi Joe,
>
> If you join different data streams, you can indeed use a step like Merge
> Join.
> However, if you want to simply merge the data from 2 or more copies of the
> same step you don't need to do anything as it's standard behavior of a step.
>
> In the case of job entries (not clear what you are building) it's indeed
> hard to have parallel entries add to the result row list.
> However, perhaps it would be more efficient to add the rows to a database
> staging table or another similar temporary container.
>
> Matt
>
>
> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
> joseph.chambers (AT) gmail (DOT) com>
>
>> I am developing a set of plugins to interface with a new data
>> platform. I've got it working in a linear fashion. However I want to
>> run some of the tasks in parallel or multiple paths/threads. I see
>> you can run multiple paths but rejoining them and having data passed
>> to the merge step seems to be an issue. I am using the prevResult and
>> returning the Result in the execute function to carry my data between
>> steps. The problem the merge/join is just called by the thread that
>> finishes first, is there a way to have some type of wait loop that I
>> can merge the data from all the previous steps going into the merge
>> step.
>>
>> I'm looking at using a static variable to enter a waiting loop that
>> would block all other calls until all the data is available, each
>> additional call to this step would, based on this static variable, go
>> into a merge function that would merge its data into a static variable
>> and then once the count has reached the number of paths continue.
>> With this I need to know a way to write a split step that can some how
>> detect the number of exiting paths, is this possible?
>>
>> There has to be a better way but I don't see a construct to do it.
>>
>> I know this doesn't quite fit in with Spoon's existing infrastructure
>> but I've been tasked with doing this.
>>
>> Thanks,
>> Joseph
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to <kettle-developers (AT) googlegroups (DOT) com>
>> kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> <http://groups.google.com/group/kettle-developers?hl=en>
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>>
>
>
> --
> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
> Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
> )
> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
--
Matt Casters <mcasters (AT) pentaho (DOT) org>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle
Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
(Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho : The Commercial Open Source Alternative for Business Intelligence
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
Is there a group dedicated to developing plug-ins? I figured the
Development board was for both the core and the development of plugins.
Thanks for the suggestions, the plug-ins get out side of the typical use of
Spoon as I understand it. What I'm doing in the multiple paths is splitting
off and pre-processing (across a cluster of servers) multiple groups of data
(this isn't a traditional database that I'm interfacing with). The
pre-processing then returns proprietary code that I must have in later steps
to utilize the the preprocessed data.
From a programming point of view, if I have 3 paths going into one step with
in the Job I assume only one object of the class is created. So if I use a
variable to switch my logic I can merge the data together as it comes in
until I've reached the number of paths and then continue.
Is there a programmatic way in a plugin to detect the number of outgoing or
inbound paths attached? I think I can handle the other issues but I don't
want this value to be a user input or hard coded.
On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
> No special reason Andy, just old habits of a Kettle guy formerly known as
> DBA.
>
>
> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
>
>> Since we are asking the questions, I would normally say use "serialize to
>> file" which keeps kettle data structures intact vs going out to files or db.
>>
>> @matt, curious why you suggest db vs the native kettle serialize
>> inputs/outputs?
>>
>> Sent from my iPhone
>>
>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
>>
>> Hi Joe,
>>
>> If you join different data streams, you can indeed use a step like Merge
>> Join.
>> However, if you want to simply merge the data from 2 or more copies of the
>> same step you don't need to do anything as it's standard behavior of a step.
>>
>> In the case of job entries (not clear what you are building) it's indeed
>> hard to have parallel entries add to the result row list.
>> However, perhaps it would be more efficient to add the rows to a database
>> staging table or another similar temporary container.
>>
>> Matt
>>
>>
>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
>> joseph.chambers (AT) gmail (DOT) com>
>>
>>> I am developing a set of plugins to interface with a new data
>>> platform. I've got it working in a linear fashion. However I want to
>>> run some of the tasks in parallel or multiple paths/threads. I see
>>> you can run multiple paths but rejoining them and having data passed
>>> to the merge step seems to be an issue. I am using the prevResult and
>>> returning the Result in the execute function to carry my data between
>>> steps. The problem the merge/join is just called by the thread that
>>> finishes first, is there a way to have some type of wait loop that I
>>> can merge the data from all the previous steps going into the merge
>>> step.
>>>
>>> I'm looking at using a static variable to enter a waiting loop that
>>> would block all other calls until all the data is available, each
>>> additional call to this step would, based on this static variable, go
>>> into a merge function that would merge its data into a static variable
>>> and then once the count has reached the number of paths continue.
>>> With this I need to know a way to write a split step that can some how
>>> detect the number of exiting paths, is this possible?
>>>
>>> There has to be a better way but I don't see a construct to do it.
>>>
>>> I know this doesn't quite fit in with Spoon's existing infrastructure
>>> but I've been tasked with doing this.
>>>
>>> Thanks,
>>> Joseph
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "kettle-developers" group.
>>> To post to this group, send email to
>>> <kettle-developers (AT) googlegroups (DOT) com>kettle-developers (AT) googlegroups (DOT) com.
>>> To unsubscribe from this group, send email to
>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>> For more options, visit this group at
>>> <http://groups.google.com/group/kettle-developers?hl=en>
>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>
>>>
>>
>>
>> --
>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>> )
>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>
>
>
> --
> Matt Casters <mcasters (AT) pentaho (DOT) org>
> Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
> )
> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
I actually don't mind the questions about plugin development.
Anyway, most people would write a step plugin for parallel work. All the
questions you ask then have easy answers.
Matt
2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
> Is there a group dedicated to developing plug-ins? I figured the
> Development board was for both the core and the development of plugins.
>
> Thanks for the suggestions, the plug-ins get out side of the typical use of
> Spoon as I understand it. What I'm doing in the multiple paths is splitting
> off and pre-processing (across a cluster of servers) multiple groups of data
> (this isn't a traditional database that I'm interfacing with). The
> pre-processing then returns proprietary code that I must have in later steps
> to utilize the the preprocessed data.
>
> From a programming point of view, if I have 3 paths going into one step
> with in the Job I assume only one object of the class is created. So if I
> use a variable to switch my logic I can merge the data together as it comes
> in until I've reached the number of paths and then continue.
>
> Is there a programmatic way in a plugin to detect the number of outgoing or
> inbound paths attached? I think I can handle the other issues but I don't
> want this value to be a user input or hard coded.
>
>
>
> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
>
>> No special reason Andy, just old habits of a Kettle guy formerly known as
>> DBA.
>>
>>
>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
>>
>>> Since we are asking the questions, I would normally say use "serialize to
>>> file" which keeps kettle data structures intact vs going out to files or db.
>>>
>>> @matt, curious why you suggest db vs the native kettle serialize
>>> inputs/outputs?
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
>>>
>>> Hi Joe,
>>>
>>> If you join different data streams, you can indeed use a step like Merge
>>> Join.
>>> However, if you want to simply merge the data from 2 or more copies of
>>> the same step you don't need to do anything as it's standard behavior of a
>>> step.
>>>
>>> In the case of job entries (not clear what you are building) it's indeed
>>> hard to have parallel entries add to the result row list.
>>> However, perhaps it would be more efficient to add the rows to a database
>>> staging table or another similar temporary container.
>>>
>>> Matt
>>>
>>>
>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
>>> joseph.chambers (AT) gmail (DOT) com>
>>>
>>>> I am developing a set of plugins to interface with a new data
>>>> platform. I've got it working in a linear fashion. However I want to
>>>> run some of the tasks in parallel or multiple paths/threads. I see
>>>> you can run multiple paths but rejoining them and having data passed
>>>> to the merge step seems to be an issue. I am using the prevResult and
>>>> returning the Result in the execute function to carry my data between
>>>> steps. The problem the merge/join is just called by the thread that
>>>> finishes first, is there a way to have some type of wait loop that I
>>>> can merge the data from all the previous steps going into the merge
>>>> step.
>>>>
>>>> I'm looking at using a static variable to enter a waiting loop that
>>>> would block all other calls until all the data is available, each
>>>> additional call to this step would, based on this static variable, go
>>>> into a merge function that would merge its data into a static variable
>>>> and then once the count has reached the number of paths continue.
>>>> With this I need to know a way to write a split step that can some how
>>>> detect the number of exiting paths, is this possible?
>>>>
>>>> There has to be a better way but I don't see a construct to do it.
>>>>
>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
>>>> but I've been tasked with doing this.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "kettle-developers" group.
>>>> To post to this group, send email to
>>>> <kettle-developers (AT) googlegroups (DOT) com>kettle-developers (AT) googlegroups (DOT) com.
>>>> To unsubscribe from this group, send email to
>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>> For more options, visit this group at
>>>> <http://groups.google.com/group/kettle-developers?hl=en>
>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>
>>>>
>>>
>>>
>>> --
>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>> )
>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>> Pentaho : The Commercial Open Source Alternative for Business
>>> Intelligence
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "kettle-developers" group.
>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>>> To unsubscribe from this group, send email to
>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "kettle-developers" group.
>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>>> To unsubscribe from this group, send email to
>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>
>>
>>
>>
>> --
>> Matt Casters <mcasters (AT) pentaho (DOT) org>
>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>> )
>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
--
Matt Casters <mcasters (AT) pentaho (DOT) org>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle
Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
(Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho : The Commercial Open Source Alternative for Business Intelligence
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
Yes we started initially using steps, but needed a little more flow
control. Forgive me my newbe questions I am new to spoon, we may need to
look back at steps (the lack of flow control might have been a knowledge
issue on my part) but we need a way to do the majority of things in
sequential order each step waiting for the next, but also split off into
multiple paths when needed.
If I can detect the number of inbound and outbound paths within the plugin I
can handle what I need in the Jobs, once we have the Jobs going I will see
if I can solve the flow issues we were having within the steps. My project
manager had ran into those and told me to do the jobs plugins. I had
suggested the "Wait on steps" to solve it but he wanted something with less
user interaction.
Also just curious on this is there a way to display data in a Job (open a
window with the results in a table) when it finishes right now I am writing
the data to a CSV file that I receive back from the server I'm calling. I
know there is in Steps/Transformations, and I've thought about calling a
Transformation from the Job to handle the display portion.
On Mon, Oct 3, 2011 at 1:49 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
> I actually don't mind the questions about plugin development.
>
> Anyway, most people would write a step plugin for parallel work. All the
> questions you ask then have easy answers.
>
> Matt
>
>
> 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
>
>> Is there a group dedicated to developing plug-ins? I figured the
>> Development board was for both the core and the development of plugins.
>>
>> Thanks for the suggestions, the plug-ins get out side of the typical use
>> of Spoon as I understand it. What I'm doing in the multiple paths is
>> splitting off and pre-processing (across a cluster of servers) multiple
>> groups of data (this isn't a traditional database that I'm interfacing
>> with). The pre-processing then returns proprietary code that I must have in
>> later steps to utilize the the preprocessed data.
>>
>> From a programming point of view, if I have 3 paths going into one step
>> with in the Job I assume only one object of the class is created. So if I
>> use a variable to switch my logic I can merge the data together as it comes
>> in until I've reached the number of paths and then continue.
>>
>> Is there a programmatic way in a plugin to detect the number of outgoing
>> or inbound paths attached? I think I can handle the other issues but I
>> don't want this value to be a user input or hard coded.
>>
>>
>>
>> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
>>
>>> No special reason Andy, just old habits of a Kettle guy formerly known as
>>> DBA.
>>>
>>>
>>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
>>>
>>>> Since we are asking the questions, I would normally say use "serialize
>>>> to file" which keeps kettle data structures intact vs going out to files or
>>>> db.
>>>>
>>>> @matt, curious why you suggest db vs the native kettle serialize
>>>> inputs/outputs?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
>>>>
>>>> Hi Joe,
>>>>
>>>> If you join different data streams, you can indeed use a step like Merge
>>>> Join.
>>>> However, if you want to simply merge the data from 2 or more copies of
>>>> the same step you don't need to do anything as it's standard behavior of a
>>>> step.
>>>>
>>>> In the case of job entries (not clear what you are building) it's indeed
>>>> hard to have parallel entries add to the result row list.
>>>> However, perhaps it would be more efficient to add the rows to a
>>>> database staging table or another similar temporary container.
>>>>
>>>> Matt
>>>>
>>>>
>>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
>>>> joseph.chambers (AT) gmail (DOT) com>
>>>>
>>>>> I am developing a set of plugins to interface with a new data
>>>>> platform. I've got it working in a linear fashion. However I want to
>>>>> run some of the tasks in parallel or multiple paths/threads. I see
>>>>> you can run multiple paths but rejoining them and having data passed
>>>>> to the merge step seems to be an issue. I am using the prevResult and
>>>>> returning the Result in the execute function to carry my data between
>>>>> steps. The problem the merge/join is just called by the thread that
>>>>> finishes first, is there a way to have some type of wait loop that I
>>>>> can merge the data from all the previous steps going into the merge
>>>>> step.
>>>>>
>>>>> I'm looking at using a static variable to enter a waiting loop that
>>>>> would block all other calls until all the data is available, each
>>>>> additional call to this step would, based on this static variable, go
>>>>> into a merge function that would merge its data into a static variable
>>>>> and then once the count has reached the number of paths continue.
>>>>> With this I need to know a way to write a split step that can some how
>>>>> detect the number of exiting paths, is this possible?
>>>>>
>>>>> There has to be a better way but I don't see a construct to do it.
>>>>>
>>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
>>>>> but I've been tasked with doing this.
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "kettle-developers" group.
>>>>> To post to this group, send email to
>>>>> <kettle-developers (AT) googlegroups (DOT) com>kettle-developers (AT) googlegroups (DOT) com
>>>>> .
>>>>> To unsubscribe from this group, send email to
>>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>> For more options, visit this group at
>>>>> <http://groups.google.com/group/kettle-developers?hl=en>
>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>>> )
>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>>> Pentaho : The Commercial Open Source Alternative for Business
>>>> Intelligence
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "kettle-developers" group.
>>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
>>>> .
>>>> To unsubscribe from this group, send email to
>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "kettle-developers" group.
>>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
>>>> .
>>>> To unsubscribe from this group, send email to
>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Casters <mcasters (AT) pentaho (DOT) org>
>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>> )
>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>> Pentaho : The Commercial Open Source Alternative for Business
>>> Intelligence
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "kettle-developers" group.
>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>>> To unsubscribe from this group, send email to
>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>
>
>
> --
> Matt Casters <mcasters (AT) pentaho (DOT) org>
> Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
> )
> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
Actually, we just added a "Job Executor" step in 4.3.0-M1 so the
possibilities have increased a bit.
As a general piece of advice, non-specific to Kettle: don't try to do
everything in one transformation or job. Make things modular to keep a nice
overview.
Think about the idea of staging the data into a buffer (file) or queue
(database table). Then you can scale as far as you like, for example like
Diethard documented a while back:
http://diethardsteiner.blogspot.com/...designing.html
Matt
2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
> Yes we started initially using steps, but needed a little more flow
> control. Forgive me my newbe questions I am new to spoon, we may need to
> look back at steps (the lack of flow control might have been a knowledge
> issue on my part) but we need a way to do the majority of things in
> sequential order each step waiting for the next, but also split off into
> multiple paths when needed.
>
> If I can detect the number of inbound and outbound paths within the plugin
> I can handle what I need in the Jobs, once we have the Jobs going I will see
> if I can solve the flow issues we were having within the steps. My project
> manager had ran into those and told me to do the jobs plugins. I had
> suggested the "Wait on steps" to solve it but he wanted something with less
> user interaction.
>
> Also just curious on this is there a way to display data in a Job (open a
> window with the results in a table) when it finishes right now I am writing
> the data to a CSV file that I receive back from the server I'm calling. I
> know there is in Steps/Transformations, and I've thought about calling a
> Transformation from the Job to handle the display portion.
>
>
>
>
>
> On Mon, Oct 3, 2011 at 1:49 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
>
>> I actually don't mind the questions about plugin development.
>>
>> Anyway, most people would write a step plugin for parallel work. All the
>> questions you ask then have easy answers.
>>
>> Matt
>>
>>
>> 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
>>
>>> Is there a group dedicated to developing plug-ins? I figured the
>>> Development board was for both the core and the development of plugins.
>>>
>>> Thanks for the suggestions, the plug-ins get out side of the typical use
>>> of Spoon as I understand it. What I'm doing in the multiple paths is
>>> splitting off and pre-processing (across a cluster of servers) multiple
>>> groups of data (this isn't a traditional database that I'm interfacing
>>> with). The pre-processing then returns proprietary code that I must have in
>>> later steps to utilize the the preprocessed data.
>>>
>>> From a programming point of view, if I have 3 paths going into one step
>>> with in the Job I assume only one object of the class is created. So if I
>>> use a variable to switch my logic I can merge the data together as it comes
>>> in until I've reached the number of paths and then continue.
>>>
>>> Is there a programmatic way in a plugin to detect the number of outgoing
>>> or inbound paths attached? I think I can handle the other issues but I
>>> don't want this value to be a user input or hard coded.
>>>
>>>
>>>
>>> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
>>>
>>>> No special reason Andy, just old habits of a Kettle guy formerly known
>>>> as DBA.
>>>>
>>>>
>>>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
>>>>
>>>>> Since we are asking the questions, I would normally say use "serialize
>>>>> to file" which keeps kettle data structures intact vs going out to files or
>>>>> db.
>>>>>
>>>>> @matt, curious why you suggest db vs the native kettle serialize
>>>>> inputs/outputs?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org>
>>>>> wrote:
>>>>>
>>>>> Hi Joe,
>>>>>
>>>>> If you join different data streams, you can indeed use a step like
>>>>> Merge Join.
>>>>> However, if you want to simply merge the data from 2 or more copies of
>>>>> the same step you don't need to do anything as it's standard behavior of a
>>>>> step.
>>>>>
>>>>> In the case of job entries (not clear what you are building) it's
>>>>> indeed hard to have parallel entries add to the result row list.
>>>>> However, perhaps it would be more efficient to add the rows to a
>>>>> database staging table or another similar temporary container.
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
>>>>> joseph.chambers (AT) gmail (DOT) com>
>>>>>
>>>>>> I am developing a set of plugins to interface with a new data
>>>>>> platform. I've got it working in a linear fashion. However I want to
>>>>>> run some of the tasks in parallel or multiple paths/threads. I see
>>>>>> you can run multiple paths but rejoining them and having data passed
>>>>>> to the merge step seems to be an issue. I am using the prevResult and
>>>>>> returning the Result in the execute function to carry my data between
>>>>>> steps. The problem the merge/join is just called by the thread that
>>>>>> finishes first, is there a way to have some type of wait loop that I
>>>>>> can merge the data from all the previous steps going into the merge
>>>>>> step.
>>>>>>
>>>>>> I'm looking at using a static variable to enter a waiting loop that
>>>>>> would block all other calls until all the data is available, each
>>>>>> additional call to this step would, based on this static variable, go
>>>>>> into a merge function that would merge its data into a static variable
>>>>>> and then once the count has reached the number of paths continue.
>>>>>> With this I need to know a way to write a split step that can some how
>>>>>> detect the number of exiting paths, is this possible?
>>>>>>
>>>>>> There has to be a better way but I don't see a construct to do it.
>>>>>>
>>>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
>>>>>> but I've been tasked with doing this.
>>>>>>
>>>>>> Thanks,
>>>>>> Joseph
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "kettle-developers" group.
>>>>>> To post to this group, send email to
>>>>>> <kettle-developers (AT) googlegroups (DOT) com>
>>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
>>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>>> For more options, visit this group at
>>>>>> <http://groups.google.com/group/kettle-developers?hl=en>
>>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
>>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>>>> )
>>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>>>> Pentaho : The Commercial Open Source Alternative for Business
>>>>> Intelligence
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "kettle-developers" group.
>>>>> To post to this group, send email to
>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>> To unsubscribe from this group, send email to
>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "kettle-developers" group.
>>>>> To post to this group, send email to
>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>> To unsubscribe from this group, send email to
>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Casters <mcasters (AT) pentaho (DOT) org>
>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>>> )
>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>>> Pentaho : The Commercial Open Source Alternative for Business
>>>> Intelligence
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "kettle-developers" group.
>>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
>>>> .
>>>> To unsubscribe from this group, send email to
>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "kettle-developers" group.
>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>>> To unsubscribe from this group, send email to
>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>
>>
>>
>>
>> --
>> Matt Casters <mcasters (AT) pentaho (DOT) org>
>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>> )
>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
--
Matt Casters <mcasters (AT) pentaho (DOT) org>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle
Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
(Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho : The Commercial Open Source Alternative for Business Intelligence
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
-
Re: Spoon Jobs Plugins -- parallel paths
Agree, will refactor once I get all the pieces working I need.
Is there some place I can look to see the function definitions of the Result
class?
On Mon, Oct 3, 2011 at 2:44 PM, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
> Actually, we just added a "Job Executor" step in 4.3.0-M1 so the
> possibilities have increased a bit.
>
> As a general piece of advice, non-specific to Kettle: don't try to do
> everything in one transformation or job. Make things modular to keep a nice
> overview.
> Think about the idea of staging the data into a buffer (file) or queue
> (database table). Then you can scale as far as you like, for example like
> Diethard documented a while back:
> http://diethardsteiner.blogspot.com/...designing.html
>
> Matt
>
>
> 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
>
>> Yes we started initially using steps, but needed a little more flow
>> control. Forgive me my newbe questions I am new to spoon, we may need to
>> look back at steps (the lack of flow control might have been a knowledge
>> issue on my part) but we need a way to do the majority of things in
>> sequential order each step waiting for the next, but also split off into
>> multiple paths when needed.
>>
>> If I can detect the number of inbound and outbound paths within the plugin
>> I can handle what I need in the Jobs, once we have the Jobs going I will see
>> if I can solve the flow issues we were having within the steps. My project
>> manager had ran into those and told me to do the jobs plugins. I had
>> suggested the "Wait on steps" to solve it but he wanted something with less
>> user interaction.
>>
>> Also just curious on this is there a way to display data in a Job (open a
>> window with the results in a table) when it finishes right now I am writing
>> the data to a CSV file that I receive back from the server I'm calling. I
>> know there is in Steps/Transformations, and I've thought about calling a
>> Transformation from the Job to handle the display portion.
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2011 at 1:49 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
>>
>>> I actually don't mind the questions about plugin development.
>>>
>>> Anyway, most people would write a step plugin for parallel work. All the
>>> questions you ask then have easy answers.
>>>
>>> Matt
>>>
>>>
>>> 2011/10/3 Joseph Chambers <joseph.chambers (AT) gmail (DOT) com>
>>>
>>>> Is there a group dedicated to developing plug-ins? I figured the
>>>> Development board was for both the core and the development of plugins.
>>>>
>>>> Thanks for the suggestions, the plug-ins get out side of the typical use
>>>> of Spoon as I understand it. What I'm doing in the multiple paths is
>>>> splitting off and pre-processing (across a cluster of servers) multiple
>>>> groups of data (this isn't a traditional database that I'm interfacing
>>>> with). The pre-processing then returns proprietary code that I must have in
>>>> later steps to utilize the the preprocessed data.
>>>>
>>>> From a programming point of view, if I have 3 paths going into one step
>>>> with in the Job I assume only one object of the class is created. So if I
>>>> use a variable to switch my logic I can merge the data together as it comes
>>>> in until I've reached the number of paths and then continue.
>>>>
>>>> Is there a programmatic way in a plugin to detect the number of outgoing
>>>> or inbound paths attached? I think I can handle the other issues but I
>>>> don't want this value to be a user input or hard coded.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 3, 2011 at 12:43 PM, Matt Casters <mcasters (AT) pentaho (DOT) org>wrote:
>>>>
>>>>> No special reason Andy, just old habits of a Kettle guy formerly known
>>>>> as DBA.
>>>>>
>>>>>
>>>>> 2011/10/3 Andy Grohe <agrohe21 (AT) gmail (DOT) com>
>>>>>
>>>>>> Since we are asking the questions, I would normally say use "serialize
>>>>>> to file" which keeps kettle data structures intact vs going out to files or
>>>>>> db.
>>>>>>
>>>>>> @matt, curious why you suggest db vs the native kettle serialize
>>>>>> inputs/outputs?
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Oct 3, 2011, at 11:33 AM, Matt Casters <mcasters (AT) pentaho (DOT) org>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Joe,
>>>>>>
>>>>>> If you join different data streams, you can indeed use a step like
>>>>>> Merge Join.
>>>>>> However, if you want to simply merge the data from 2 or more copies of
>>>>>> the same step you don't need to do anything as it's standard behavior of a
>>>>>> step.
>>>>>>
>>>>>> In the case of job entries (not clear what you are building) it's
>>>>>> indeed hard to have parallel entries add to the result row list.
>>>>>> However, perhaps it would be more efficient to add the rows to a
>>>>>> database staging table or another similar temporary container.
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>> 2011/10/3 Joe Chambers < <joseph.chambers (AT) gmail (DOT) com>
>>>>>> joseph.chambers (AT) gmail (DOT) com>
>>>>>>
>>>>>>> I am developing a set of plugins to interface with a new data
>>>>>>> platform. I've got it working in a linear fashion. However I want
>>>>>>> to
>>>>>>> run some of the tasks in parallel or multiple paths/threads. I see
>>>>>>> you can run multiple paths but rejoining them and having data passed
>>>>>>> to the merge step seems to be an issue. I am using the prevResult
>>>>>>> and
>>>>>>> returning the Result in the execute function to carry my data between
>>>>>>> steps. The problem the merge/join is just called by the thread that
>>>>>>> finishes first, is there a way to have some type of wait loop that I
>>>>>>> can merge the data from all the previous steps going into the merge
>>>>>>> step.
>>>>>>>
>>>>>>> I'm looking at using a static variable to enter a waiting loop that
>>>>>>> would block all other calls until all the data is available, each
>>>>>>> additional call to this step would, based on this static variable, go
>>>>>>> into a merge function that would merge its data into a static
>>>>>>> variable
>>>>>>> and then once the count has reached the number of paths continue.
>>>>>>> With this I need to know a way to write a split step that can some
>>>>>>> how
>>>>>>> detect the number of exiting paths, is this possible?
>>>>>>>
>>>>>>> There has to be a better way but I don't see a construct to do it.
>>>>>>>
>>>>>>> I know this doesn't quite fit in with Spoon's existing infrastructure
>>>>>>> but I've been tasked with doing this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Joseph
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "kettle-developers" group.
>>>>>>> To post to this group, send email to
>>>>>>> <kettle-developers (AT) googlegroups (DOT) com>
>>>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> <kettle-developers%2Bunsubscribe (AT) googlegroups (DOT) com>
>>>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>>>> For more options, visit this group at
>>>>>>> <http://groups.google.com/group/kettle-developers?hl=en>
>>>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Matt Casters < <mcasters (AT) pentaho (DOT) org>mcasters (AT) pentaho (DOT) org>
>>>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>>>>> )
>>>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>>>>> Pentaho : The Commercial Open Source Alternative for Business
>>>>>> Intelligence
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "kettle-developers" group.
>>>>>> To post to this group, send email to
>>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "kettle-developers" group.
>>>>>> To post to this group, send email to
>>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matt Casters <mcasters (AT) pentaho (DOT) org>
>>>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>>>> )
>>>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>>>> Pentaho : The Commercial Open Source Alternative for Business
>>>>> Intelligence
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "kettle-developers" group.
>>>>> To post to this group, send email to
>>>>> kettle-developers (AT) googlegroups (DOT) com.
>>>>> To unsubscribe from this group, send email to
>>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "kettle-developers" group.
>>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
>>>> .
>>>> To unsubscribe from this group, send email to
>>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Casters <mcasters (AT) pentaho (DOT) org>
>>> Chief Data Integration, Kettle founder, Author of Pentaho Kettle
>>> Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
>>> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
>>> )
>>> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
>>> Pentaho : The Commercial Open Source Alternative for Business
>>> Intelligence
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "kettle-developers" group.
>>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>>> To unsubscribe from this group, send email to
>>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/kettle-developers?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kettle-developers" group.
>> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
>> To unsubscribe from this group, send email to
>> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
>> For more options, visit this group at
>> http://groups.google.com/group/kettle-developers?hl=en.
>>
>
>
>
> --
> Matt Casters <mcasters (AT) pentaho (DOT) org>
> Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
> (Wiley<http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>
> )
> Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
> Pentaho : The Commercial Open Source Alternative for Business Intelligence
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kettle-developers" group.
> To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
> To unsubscribe from this group, send email to
> kettle-developers+unsubscribe (AT) g...oups (DOT) com.
> For more options, visit this group at
> http://groups.google.com/group/kettle-developers?hl=en.
>
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules