Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Hadoop steps are slow

  1. #1
    Join Date
    Aug 2015

    Default Hadoop steps are slow


    Please forgive me if my question is not related to this forum but as per my observation this needs to be answered by you(experts). raised the same question in Pentaho data integration link but no one was answered(

    I am trying to load data from SOURCE to TARGET database. and following below approaches and trying to use hadoop file system but as per my observation Text file system is taking less time to process the data. for both type of file system i am using extension as CSV(coma separated values).

    Table Input -> Text file Output then Text file input -> Table output (took 13 sec for 7000 records)
    Table Input -> Hadoop file Output then Hadoopfile input -> Table output(took 20 sec for 7000 records)

    looking for best approach. generally hadoop file system should load data fast as compare to text file system, is it correct

    but it is not happening in my case am i need to install hadoop realted stuff in my PC ?

    Thank you

  2. #2


    There are a lot of factors to be considered. Things like the cluster configuration, number of data nodes, amount of memory and resources available to YARN are some things that need to be looked at.

    At a very conceptual level though, you will find hadoop data processing (mapreduce, hive, sqoop, etc) to be 'slower' than conventional text file input. Once you are looking at real big-data - data in terabytes, petabytes range - thats when the traditional data processing frameworks fall short and hadoop will be 'faster'.

    Hope that helps.


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.