Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Hadoop TFI - streaming or dumping/extracting from temp files

  1. #1

    Default Hadoop TFI - streaming or dumping/extracting from temp files

    First question brought up in the Beta Kickoff Meeting
    (Daniel, pardon me for butchering your question)

    Does the Hadoop Text File Input (TFI) step stream data from HDFS or use the standard VFS process of writing data to text files and extracting from there?

    Hopefully I got that close to correct


  2. #2
    Join Date
    Mar 2008

    Default Current access to Hadoop is through Apache VFS

    The Hadoop Text File In/Output steps as well as the Hadoop Copy Files job entry use VFS to access files. We have submitted a patch to the Hadoop project for the HDFS / VFS driver.

    Please watch it and vote for it!


  3. #3
    jdixon Guest


    I think the question is really about streaming vs temp files.

    Does our VFS implementation first copy the HDFS file to a temp file and then read from there. I think this is Daniel's questions.


  4. #4
    Join Date
    Jan 2006


    The HDFS VFS driver should be streaming the data, the code I put in in place to handle this in the driver is:

    protected InputStream doGetInputStream() throws Exception {
    FSDataInputStream in = Path(getName().getPath()));
    return in;

    This is not ever invoked by a user api, but it is eventually called when you do something like this:


    Hope this helps.

  5. #5
    DEinspanjer Guest


    Cool. The streaming bit is important because some other VFS engines like gzip and tar and zip will extract the contents out to the tmp dir then read the contents from there.

    That would be bad in the Hadoop world where you might be working on a file that is larger than your local disk space.

  6. #6
    Join Date
    Aug 2010

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.