View Full Version : Step limitations for Pentaho Map Reduce

08-24-2012, 03:11 PM
I'm new with pentaho's big data features and I was testing some examples for Pentaho MapReduce job.
My wondering is if there exists some limitation for different steps to be inside the mapper or reducer transformation of the Pentaho Map Reduce Job? Or this is up to the designer?
I mean group by should not be in the mapper for example.
As I understood from the documentation, mapper/reducer transformations are traslated into map/reduce methods in java code and sent to the cluster. Is that correct?


08-27-2012, 09:24 AM
You're correct in saying there are no limitations for which Kettle steps you use for any of the transformations used within Pentaho MapReduce (mapper, combiner, or reducer). With most tools there are better options or designs for solving a specific problem.

The transformations participating in a Pentaho MapReduce job are not translated but instead or executed as is with Hadoop. There is a thin layer that provides the execution environment and translation of data types for the transformations within the context of the mapper, combiner, and/or reducer.

01-21-2013, 12:23 PM
Hi there,

One question: there are steps in PDI (P.E. Order, Group by) that use temporary files (and request a root to be specified to write/read them).

How does this work when using the trasformation in a MapReduce job?