PDA

View Full Version : Have you successfully tried the new Input/Output steps and Copy Files job entry?



jtcornelius
07-07-2010, 10:42 AM
Quick poll to see how everyone's testing is going...

DEinspanjer
07-20-2010, 02:40 PM
Was able to do some streaming reads of files stored in HDFS.

Had issues with it not being able to display the format of the fields when doing a "Get Fields".

Got an error when trying to browse HDFS to find part-r-###### files. Still worked though.

<strike>Alright performance. About 26k RPS reading and writing through a single copy</strike>

Actually, my first run was processing data using only one HDFS reader and one Vertica Table Output step. The output step was the bottleneck there.
On my dual quad core xeon server, I was able to bump the HDFS readers up to 3 and use six Table Output steps before saturating IO on the server.
The total throughput at that point was 78k RPS.
I processed 30M records (221 HDFS files) in 6 minutes.