PDA

View Full Version : Insert csv file to Hadoop Hive table



firantika
10-20-2011, 06:00 PM
Hi All,

I have tried new things on PDI 4.2, that is hive step.
i have successfully put some file to HDFS from PDI whit Hadoop Output Step.Now i want to export CSV file to Hive Table,

i have try to connected CSV file input to Hadoop File Output step, but my file csv only on HDFS, i want to it generated automatically on HIve Table as when i exporting CSV file to Mysql Table.

How i can get it ?



Thanks,

Jasper
11-01-2011, 09:29 AM
Hi Firantika,

Have you set up the Hive table upfront? Hive tables definitions can not be generated automatically. You first need to create the hive table on the Hadoop cluster.

Example:
From the Hive command line interface:

CREATE EXTERNAL TABLE some_table (City STRING, Neighborhood STRING, Inhabitants INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/firantika/hive/some_table'
STORED AS TEXTFILE;

This is for a tab seperated file.

If you add files in HDFS to '/user/firantika/hive/some_table/' your some_table will automatically be populated. Everything you put into this folder will be 'added to the table'. In the end Hive just presents the contents of HDFS folder as table. But under the hood these are just HDFS files.