View Full Version : Hadoop with PDI CE 4.2.1

02-10-2012, 08:10 PM
Hi all

is use of the hadoop job executor step supported in the PDI 4.2.1 community edition without installation of the enterprise components (PHD EE) on the hadoop nodes ? or is PHD stil needed for pure jar based jobs.

If so, is there a community edition version that matches 4.2.1 and what is the link to it

Ronan S.

02-10-2012, 09:33 PM

beginning with 4.3. pre-release CE, kettle includes all big data components. The release is linked on the big data community home:

Please check out the child pages of that page, there is a growing number of updated docs and how-to's to get things up and running with the CE version


02-10-2012, 10:54 PM
Thanks Slawo,

However in the meantime, can 4.2.1 community edition be used for executing jar based jobs or does it require the PHD ?


02-10-2012, 11:16 PM
for 4.2.1 you'd have to get PHD, as far as I know

02-13-2012, 01:31 AM
PHD is required in 4.2.1 and 4.3 Pre release. We have a solution that gets rid of PHD that we will try to get in for 4.3 RC and definitely by 4.3 GA

02-13-2012, 01:59 PM
Hi Ronan,

You can use 4.2.1 CE to submit jobs using a jar. That functionality is provided through the "Hadoop Job Executor" step. With this you can use a custom MR job with Java-code or even Hadoop streaming. It's a thin UI wrapper around the JobClient interface for submitting Hadoop jobs.

Hope this helps,

02-24-2012, 12:08 AM
To clear up any misunderstanding, this was for to execute pure java jar based jobs (not pentaho map reduce transformations), and I was able to confirm that they can execute under 4.2.1 without installing PHD on the hadoop nodes.

02-24-2012, 12:11 AM
Happy to hear that, ronans!