PDA

View Full Version : Modifications to 'Merge Join' Transform



nmgoza
09-14-2007, 11:40 AM
Currently thisTransform has 2 inputs as follows

'First Step and 'Second step' - My assumption is that this transform will only allow 2 input steps at any given point when it is used. However in some scenarios one would like to input more than 2 TableInputs or any other input from different Database Connections.

Understandably this will be abit of an overhead to the engine because the join will be performed by the Pentaho Server. I am not sure if I am driving my point properly.

Chat later. Cheers.

sboden
09-18-2007, 05:40 PM
Request denied :D ... if you want to use that, use multiple merge joins in sequence. You can also enter feature requests at http://jira.pentaho.org

For your request, I have no clue what the semantics would be for a merge join with more than 2 inputs. I think your request is not doable.

Regards,
Sven

nmgoza
09-19-2007, 01:13 AM
Its cool sboden, was not fighting :D. I understand. In any case, the aim of the request was tryiing reduce the number of sequence merge Joins that 1 had to use.

sboden
09-19-2007, 04:06 AM
It may sound good... but for merge join it wouldn't even work. The trick of merge join is that you have 2 sorted inputs and you advance either through one or the other, until you reach end. With e.g. 3 inputs what's a changed record on output?

Regards,
Sven

nmgoza
09-20-2007, 02:14 AM
My thinking was the following.
The main driver would be the following definition.

1. Define your main Input source to drive the Condition or Where Clause
2. Define all other Inputs, this is where my theory was coming into play, that the 2nd input can be exploided into many inputs.
3. Define your join conditions

For Example
Main Input : Source1
Other Inputs : Source2, Source3, Source4
Condition : Source1.ID = Source2.ID
and Source1.ID = Source3.ID
and Source1.ID = Source4.ID

Resolution
Advance through 'Main Input' for all Inputs in 'Other Inputs' using join specified in condition statement. In this case the advance would apply only to 'Main Input'.

Granted, I have not as yet looked at the Merge Join source code to make a better case of my point. I will be downloading it today and have look and maybe I will also be seeing things in your view.

Nevertheless, thanks for having the time in evaluating the possibility.

sboden
09-20-2007, 07:04 AM
So for now use merge joins in sequence. Personally I would also only allow 2 hops (as it now) to keep it as simple as possible. Both from GUI perspective as source code wise... you will see when you look at the code ;-)

Regards,
Sven