View Full Version : How to use Hadoop computing power while you create transformations on pdi-ce-4.2.0-st

11-20-2012, 09:58 AM
Hello all,

i m a new bee in Bigdata and Pentaho.:rolleyes: i am using Hadoop version cdh3u4 and pdi-ce-4.2.0-stable windows version.

i am able to create transformations (modified java script value) which will use hadoop file input and which wil result in hadoop file output and running those transformation via GUI version an then check output via urls of hdfs-hadoop.

but here question arises ,

1) does my transformation creating output file using hadoop (multinode structure) power ?? as i am not using any mapreduce program here?? and also no jobtracker entry exist for this RUN.

2) after googling around i can see that new release 4.3 consist component called mapreduce - so is that similar to what we code in java ?? if yes why we will use kettle(spoon) and not any other java editor(eclipse) to create mapreduce program?

3) forget about java editor and suppose i started using penatho mapreduce transformation to implement mapreduce logic and hdfs input/output then my next motto will be make this transformation run in hadoop environment. how to run it there? you can only export it to xml and not .sh - if you will be able to export it to .sh then you will scp it(whole transformation logic ) to unix box and run it there in hadoop environment and at the end - output is ready .. but is that possible with kettle ??

people please help me i kinda confused with this questions and stuck :( becouse though i m having DI installed and hadoop 3 node cluster available i am not able to use hadoop computing power to reduce the completion time for my task. :confused: