PDA

View Full Version : HBase and ETL



shujamughal
08-18-2010, 07:42 PM
Hi

I am wondering if anyone can tell me whether we can put data in Hbase using PDI.

Thanks
Shuja

jtcornelius
08-20-2010, 08:45 AM
Hello Shuja,

Right now, we do not have an input step for HBase but I am very interested in exploring requirements for adding such a step. If you have experience with PDI and HBase and have ideas for how you would envision the step working, please comment on this case - http://jira.pentaho.com/browse/PDI-4477

In the mean time, you may want to explore interim ways of integrating HBase data with PDI. If you are currently using the HBase API for retrieving data (get), you could likely use PDI's User Defined Java Expression step as a way to take your code for accessing HBase data and mapping in into a PDI transformation.

Another option might be to use the Shell Script job entry to call a script that performs the get/scan and does a dump from HBase that can then be accessed using standard input steps like the Hadoop File Input or Text File Inputs steps.

If you do further exploration in this area, please share your findings!

Best regards,
Jake