PDA

View Full Version : Looking documentaion of Hadoop Integration



kalu.khan
10-19-2010, 05:26 PM
Hello,

I ran into the following Pentaho Road show on Hadoop.
http://vimeo.com/14641559

I am working on a project where we are going to use both PDI and Hadoop. I am looking for some guideline and documentation on to integrate PDI on Hadoop.

Thanks

jtcornelius
10-20-2010, 08:11 AM
Here are a few resources to get you started:

Main Hadoop landing page (overview, download, data sheets, etc.) - http://www.pentaho.com/hadoop/
5 part video series explaining how to leverage Hadoop for BI and DI - http://www.pentaho.com/hadoop/resources.php
Community/Beta Wiki page (videos from various phases of the beta program, documentation attached to beta phase 3 wiki page, etc.) - http://wiki.pentaho.com/display/EAI/PDI+and+Hadoop+Integration

Note that if you have an identified project you can get enabled with support for a 30 day evaluation period to get you jump started.

Regards,
Jake

jtcornelius
10-20-2010, 10:46 AM
absolutely, there is a Job Entry called the 'Hadoop Transformation Job Executor' that can be used for that purpose. There are also two samples illustrating the use of the Job Entry which can be found in the <PDI install dir>\samples\jobs\hadoop\:

- hadoop-mr-ktr.kjb
- weblogs-mr-ktr.kjb

The first one uses transformations to illustrate implementation of the Hadoop wordcount example using PDI transformations, the second one shows how to do a weblog parse (in the Map phase) followed by a reducer which collapses the data on territory I believe.

Regards,
Jake

kalu.khan
10-20-2010, 10:52 AM
Thanks Jake,

I found the following Document as well.

http://wiki.pentaho.com/download/attachments/18219271/hadoop_pentaho.pdf