PDA

View Full Version : Hbase Transformation created in one machine is not running in another



ram_sivakumar
09-18-2012, 05:44 AM
We are creating a transformation in Pentaho with Hbase as Input. It will ask for column mapping in Pentaho for Hbase table.


We are using Ubuntu machine and from there we launched Pentaho kettle 4.3.0 and it directly interacts with the Hbase table.


The job is successfully running...


But when we run the same transformation in another machine (after creating the new table with the same column family name), the job is not running (in command line)... it is saying "Mapping does not exists" (tablename,mapping name)...

Pls suggest how to run a Hbase transformation when it is created from another machine ... (there is no mapping details in the KTR except the mapping name... column details related to that mapping is not available in KTR)

regards,
Rams

Mark
09-19-2012, 12:12 AM
Hi,

I assume that the second machine that you are referring to has a separate instance of HBase running on it (distinct from the first machine)? If so, then the issue is that the mappings persisted to the HBase instance on the first machine are not available in HBase on the second machine. These will need to be recreated on the second HBase machine. Alternatively, you can try a development build of the big data plugin from our CI server. The HBase steps now have the ability to encapsulate the mapping information in the transformation XML (as well as persist it in HBase).

http://ci.pentaho.com/view/Big%20Data/job/pentaho-big-data-plugin/

Cheers,
Mark.

ram_sivakumar
09-19-2012, 05:32 AM
Hi,

I assume that the second machine that you are referring to has a separate instance of HBase running on it (distinct from the first machine)? If so, then the issue is that the mappings persisted to the HBase instance on the first machine are not available in HBase on the second machine. These will need to be recreated on the second HBase machine. Alternatively, you can try a development build of the big data plugin from our CI server. The HBase steps now have the ability to encapsulate the mapping information in the transformation XML (as well as persist it in HBase).

http://ci.pentaho.com/view/Big%20Data/job/pentaho-big-data-plugin/

Cheers,
Mark.


Mark,
Thanks a lot for your reply!!!
You are correct. Both Hbase are different instances..
Can you please help in how to recreate the mapping in the second machine? Is there any command in Hbase shell to create mapping of Pentaho into Hbase table?
Also in the URL do I need to download "Project Kettle-4.4 pdi-ce-4.4.0-r1.tar.gz (http://ci.pentaho.com/job/Kettle-4.4/lastSuccessfulBuild/artifact/Kettle/pdi-ce-4.4.0-r1.tar.gz)"? or all the below files needs to be downloaded?
pdi-ce-4.4.0-r1.tar.gz 147.99 MB view
pdi-ce-4.4.0-r1.zip 148.33 MB view
pdi-ce-4.4.0-SNAPSHOT.tar.gz 148.00 MB view
pdi-ce-4.4.0-SNAPSHOT.zip 148.37 MB view

Thanks and Regards,
Rams

Mark
09-19-2012, 03:53 PM
Hi Rams,

The recreation of the mapping on the second machine would have to be done manually I'm afraid (i.e. from the UI of the HBase step running on the second machine you'd have to use the mapping editor to create the mapping again and save it).

For the new version of the big data plugin you can actually get a snapshot of PDI 4.4 that includes the big data plugin:

http://ci.pentaho.com/view/Big%20Data/job/pentaho-big-data-plugin/lastSuccessfulBuild/artifact/dist/

Download pdi-ce-4.4.0-SNAPSHOT-big-data.zip (or the .tar.gz compressed version if you prefer).

Cheers,
Mark.

dmoran
09-19-2012, 04:41 PM
Rams,

If you are using Windows, the .zip is fine but on Mac or Linux, you should grab the .tar.gz Otherwise the execute permissions for the .apps and shell scripts won't be set properly.

ram_sivakumar
09-20-2012, 02:05 PM
Thanks dmoran... I got it....

ram_sivakumar
09-21-2012, 06:45 AM
I tried installing it in Ubuntu ( gz file ). But when I try to open the mapping sheet after giving the zookeeper host and port, it is giving "invalid host:port". Pls let me know whether any other jar files needs to be copied in any of the folders. Have copied hbase, Hadoop , zookeeper jars into libext/pentaho folder. Other than this anything had to be copied?
note : version 4.3.0 we already configured in that machine and it is working fine now too. I unzipped this into different folder.
Thanks and regards,

Rams

Mark
09-24-2012, 11:56 PM
Hi,

There is no need to copy files into libext/pentaho any longer. The new version of the big data plugin has various configurations for different hadoop distributions. If you need to update the HBase jars you'll find them in

plugins/pentaho-big-data-plugin/hadoop-configurations/<config>/lib/pmr/

where "<config>" is the name of the configuration that you are using (note that the default is hadoop-20, and that this is specified in plugins/pentaho-big-data-plugin/plugin.properties).

Cheers,
Mark.