PDA

View Full Version : Question regarding test data



jtcornelius
07-02-2010, 12:46 PM
Email from Shingo:
Is there any hive(hdfs) data to test hadoop compatible PDI?
or, they should be based on data at own environment.
and also which volume of data (such as Giga byte basis / Peta byte basis)
would you suggest to use during testing?

jtcornelius
07-02-2010, 12:49 PM
if you look under the samples\transformations\files of your PDI installation there are a couple small CSV files.

For the most part, any delimited file should work fine for the Hadoop Text file input step.

The Hadoop Copy Files Job Entry should work with any file type.

Regarding data set sizes, we would love to know how it performs for you using the biggest data sets you have.

Regards,
Jake