View Full Version : Accessing a depencency on the nodes of a M/R cluster

01-24-2013, 04:57 PM
I'm trying to get the GeoLookup step to work inside a mapper task. The step requires a phisical access to the .dat file (I tried to change that behavior by implementing VFS support, but it requires random access to it, so it's not the best approach)

Also tried to change the maxmind plugin to copy the file from a vfs location to the cluster's tmp dir. Seems to work but for some reason the mapreduce task hangs with a bunch of threads in a BLOCKED state. Didn't give up on this...

What I'd like to know is if there's any better way o copy a random file to the nodes the same way kettle libs are (maybe putting in lib/pmr..). And the second thing is if there's any variable / whatever that allows me to find the location of such file. It has to be dynamic since everytime a task is executed, jobtracker creates a different directory for all this.

Am I missing something?