Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: How to set JVM memory parameters for the nodes?

  1. #1
    Join Date
    Nov 2010

    Default How to set JVM memory parameters for the nodes?


    I read here

    After several days changing the plugin and debugging the origin of the problem, I finally discovered that by default mapreduce tasks run with a maximum memory of -Xmx200m [...] that value was clearly insufficient to run the transformation [...] So do yourself a favor - increase the available memory on the cluster.
    In kettle I can set the JVM memory parameters in spoon.bat but how do you do so for the MapReduce tasks??


  2. #2
    Join Date
    Sep 2012


    There is a property you can set on the Hadoop cluster in mapred-site.xml called "" which is the -Xmx option:


    Check this StackOverflow post for more info:

  3. #3
    Join Date
    Nov 1999


    What Matt said and correct me if I'm wrong...

    Users can choose to override default limits of Virtual Memory and RAM enforced by the task tracker, if memory management is enabled. Users can set the following parameter per job:
    Name Type Description
    mapred.task.maxvmem int A number, in bytes, that represents the maximum Virtual Memory task-limit for each task of the job. A task will be killed if it consumes more Virtual Memory than this number.
    mapred.task.maxpmem int A number, in bytes, that represents the maximum RAM task-limit for each task of the job. This number can be optionally used by Schedulers to prevent over-scheduling of tasks on a node based on RAM needs.
    Values can be passed through the user defined settings in the Pentaho Map Reduce settings.
    Last edited by MattCasters; 04-16-2013 at 09:43 AM.

  4. #4
    Join Date
    Nov 2010


    Hi all,

    Thank you very much for your replies.

    We changed the property to -Xmx2048m (it turns out it was set to 1024). Unfortunately we're still facing very serious performance issues.

    I have a simple HIVE query that takes aprox. 3 minutes tu run. I reproduced it with PDI and it takes several hours. Both Map and Reduce phases are
    excruciatingly slow.

    I've been developing with Kettle during several years and I'm confident that both the Mapper and Reducer transformation are well designed. And anyway, as I said, they're very simple (the reducer is a mere group by).

    Thought it could be a memory issue but now it seems it isn't. Could PDI be REALLY this slow? I find it hard to believe. The cluster's working fine coz Hive jobs run great.

    Any idea of what could cause this?

    (running PDI 4.4.0 over a 6 node CDH4 cluster)

    Thanks again for all your help!



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.