Let’s have a quick look at some clustering examples in the new 3.0 engine:

This example runs all steps in the transformation in a clustered mode. That means that there are 4 slave transformations that run in parallel with each other.
The interesting part is that first of all the “Fixed Input” step is running in parallel, each copy reading a certain part of a file.
The second thing to mention about it is that we now allow you to run multiple copies of a single step on a cluster. In this example, we run 3 copies of a step per slave transformation. In total there are 12 copies of the sort step in action in parallel.
IMPORTANT: this is a test-transformation, a real world sorting exercise would also include a “Sorted Merge” step to keep the data sorted. I was too lazy to redo the screenshots though
The clustered transformations now also support logging to databases:

As you can see, we added the name of the transformation in the dialog to show the log of all the slave transformations executed.
This feature makes it easier to see the execution log on the remote server, after the run.
It is now also possible to use the execution stats in a JavaScript job entry (previously called Evaluation):

As such, you can validate that the number of input rows where the same as the output rows. The numbers returned to are the total amounts, the sum of the stats for all the slave transformations.
To finish up for today, here is a sample where we partition data in a certain fashion:

This sample reads a file on the master, then sends if off to 2 slave servers (over TCP/IP sockets).
The data then needs to be partitioned. That means that data from one slave server needs to go to all the others. For this Nx(N-1) sockets are opened for each partition that’s running on the slave server. In this case we have 2 partitions running meaning that we open 4 extra sockets to make sure the data can find a way to the correct partition.
In the picture above, the “Text File Output” and “Get the partition Nr” steps are partitioned using the same partition schema and as such, the data stays in the same “swimming lane” without any performance loss.
The Dummy Step (note that the icon has changed since I took the screenshot) is doing the re-partitioning here. If you open the generated slave transformations you will see the form that these transformations take:
A slave transformation sample using 20 extra sockets to re-partition the data.
We are also correctly setting a number of internal variables for your convenience now. This allows you to read/write the data from/to partitioned files, etc.
The Dx2 or Dx10 means that the partitioning is dynamic: 2 or 10 partitions per slave server in the cluster:

The advances in 3.0 shown above are making sure that the Kettle engine allows you to scale beyond a single box. In fact, it will allow you to get the maximum out of your favorite disk subsystem by attacking your data in parallel.
Until next time,
P.S. Don’t you just love these new icons