PDA

View Full Version : HBase output failing



rajpal
04-23-2012, 08:10 PM
Hi,
I'm using kettle-4.3 perview verison and facing issue with writing into HBase db. I have a working Job/Tranformation where input files after transformation are getting written into csv file, which is working fine. Now instead of writing into csv file I routed output to the HBase Output and getting error for field which I don't even have in my output stream.
While sending output to HBase I have just replaced replaced "write to csv file" with "HBase output" kettle construct and in HBase output "create/Edit Mappings" tab have input fileds in the output stream in the "Alias" column and "Column name" has the corresponding names in the HB table. Also, I have selected correct family and key values.

Any cluses, what might be going wrong here?

Here are spoon logs:
INFO 23-04 23:57:13,572 - Closed zookeeper sessionid=0x236d19613a80090
INFO 23-04 23:57:13,583 - Session: 0x236d19613a80090 closed
INFO 23-04 23:57:13,583 - EventThread shut down
ERROR 23-04 23:57:13,638 - HBase Output - Unexpected error
ERROR 23-04 23:57:13,638 - HBase Output - org.pentaho.di.core.exception.KettleException:
Can't find incoming field "short_field1" defined in the mapping +"mapping_test"


at org.pentaho.di.trans.steps.hbaseoutput.HBaseOutput.processRow(HBaseOutput.java:208)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:662)


INFO 23-04 23:57:13,639 - HBase Output - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1)


Thanks
Raj

rajpal
04-24-2012, 02:20 PM
Additional info:
Output to HBase results in error only if Transformed output stream is directly directed to HBase. Same Transformed output stream stored in csv file and then uploading csv file into HBase works fine. Any suggestions, what extra precautions should be taken (removing unmapped fields, etc) to make it work?

Kettle errors about filename(which is input to transoformation) short_filename, path, type, etc as field not being in the incoming field.
"Can't find incoming field "<above mentioned field name>" defined in the mapping +"mapping_test" "

I trued to remove these from data stream entering into HBase output but every time new field is reported as missing. Any clues??

Thanks

cdeptula
04-26-2012, 03:34 PM
It looks like for the HBase Output step that every field in the input stream must be mapped to a HBase column in your mapping. The way to fix this is limit the stream going to the HBase Output step to just those fields that are mapped to HBase. Do this with a Select Values step only selecting the columns that are mapped in your HBase mapping.

rajpal
05-10-2012, 02:34 PM
Thanks for your reply. I found select kinda tedious as I have to list down all the fields to be removed from row stream. I found workaround which fits into my needs i.e. reading data from csv file in separate transformation instead of directly routing output to HBase. Which is working just fine with overhead of storing results into csv file.