View Full Version : Apache Hadoop version 0.20.X for Pentaho on Windows

12-20-2012, 06:14 AM
I just wanted to know if anyone has configured Apache Hadoop for Windows?

I believe Pentaho in my case version 4.8 comes configured with Apache Hadoop but I am trying to understand how it works because I cannot get to work so far.

Here is what I am trying to follow the Hadoop Tutorial.

I beleieve in the past there was PHD (which is no longer in use). I also understand that if you are configuring a different version e.g. Clodera or MapR then some configuration needs changing. I am trying to find where my hadoop port resides. If I use job entry to Copy hadoop files when I try to browse my destination directory it doesn't work at all. The port is 9000? Where do I find it.

I have seen we have a core-site.xml that is empty with 1 tag i.e. configuration, other files I cannot see are hdfs-site.xml and mapred.xml. Should these files be available. Do I need to install hadoop from PHD User guide still.

Thank You.



05-30-2013, 11:19 PM
I am also facing similar problem. Please let me know, how you resolved or have any alternative solutions.

Appreciate your feedback.


06-19-2013, 08:18 AM
sorry as I've no clue towards the topic you have raised here ... looking over the web and scanning through multiple informative sites to get a better idea about how to get it right soon .. but have not found anything helpful yet to catch up with the solution well .. is there anyone who knows it good .. please help..

06-19-2013, 08:39 AM
Pentaho 4.8 doesn't come with Hadoop, it comes with client-side support for various vendors' Hadoop distributions, such as Cloudera (versions 3u4 and 4), MapR, and Apache Hadoop 0.20. You'll need a Hadoop distribution from one of these vendors installed somewhere, then you'd configure your Pentaho Data Integration steps, job entries, etc. to "point at" your Hadoop distribution. If you're looking to install Apache Hadoop on Windows, there's a blog post here describing how to do this with Cygwin:


Also HortonWorks has a beta platform for Windows:


The default port(s) for your Hadoop platform should be identified in the documentation for that platform.

Hope this helps,