View Full Version : Problem while using Hadoop File Input and Hadoop File Output

09-29-2012, 03:18 AM

Below is my scenario for which I am using Hadoop File Input and Hadoop File Output.

1. Hadoop File Input 1 - Used for a Hive table to get the records
2. Hadoop File Input 2 - Another HDFS File Input Source
3. Doing Lookup for a common field in both the files
4. Hadoop File Output - Writting the unmatched records into the same File referred in Step 1.

So Step (1) & Step (4) is referring to the same Files...

Problem :
When we run the transformation, the source file is becoming empty. After investigation have found there is an option not to write the file at the beginning in the Hadoop File Output. If we select that option, the source is not getting truncated. But the problem is when all the records got matched and no records to write in the Output file, the content of the Output file remains the same.. means not becoming as empty file...
Please suggest what should be done in this scenario...