Hitachi Vantara Pentaho Community Forums
Page 1 of 3 123 LastLast
Results 1 to 10 of 23

Thread: Java Heap Space OutOfMemoryError in a Sort step

  1. #1
    Join Date
    Mar 2007
    Posts
    216

    Smile Java Heap Space OutOfMemoryError in a Sort step

    Hi,

    I have a sort step that stops after reading 97 095 675 lines.
    The error message is :
    Code:
    2007/11/26 15:03:51 - Sort rows.0 - ERROR (version 3.0.0, build 500 from 2007/11/14 14:59:11) :     at java.io.BufferedInputStream.<init>(Unknown Source)
    (...) at org.pentaho.di.trans.steps.sort.SortRows.getBuffer(SortRows.java:206)
    (...) at org.pentaho.di.trans.steps.sort.SortRows.processRow(SortRows.java:370)
    (...) at org.pentaho.di.trans.steps.sort.SortRows.run(SortRows.java:503)
    I changed the -Xmx paramter from 256 to 512M but It stops after reading the same number of lines. The sort step have "Only pass unique rows" enabled and "Compress TMP Files" disabled. Should I change that ?

    a+, =)
    -=Clément=-

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Either give it even more memory or lower the amount of rows to sort at once. (in the sort step)

    Matt

  3. #3
    Join Date
    May 2006
    Posts
    4,882

    Default

    97.095.675 lines <sigh> ... although there's no hardcoded size limitation in sort step, I think you hit your memory constraint. You could try playing with the sort size but I would try to do the sorting via the database if possible.

    Regards,
    Sven

  4. #4
    Join Date
    Mar 2007
    Posts
    216

    Smile

    Hi,

    I have changed 512m to 1024m. I also changed the java io temp directory to a place it does can have it's required 10GB. I changed the pagefile.sys (windows swap file) to be 3.4GB large. I still have the error. I will now try with 5 000 instead of 10 000 'rows in memory' in the Sort step. I will see the result tomorrow as it takes 2 hours of cpu time. I would not use the database sorter as my goal is to pass only unique rows before insert.

    a+, =)
    -=Clément=-

  5. #5
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    Quote Originally Posted by clement View Post
    I would not use the database sorter as my goal is to pass only unique rows before insert.
    Isn't select distinct an option?
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    http://jira.pentaho.org/browse/PDI-516

    You should see a more "sustainable" solution in version 3.0.1.
    I'm running it with great success on my machine. I'll be able to commit the code it in a few days.

    Matt

  7. #7
    Join Date
    Mar 2007
    Posts
    216

    Smile

    Hi,

    Please see the attached file : this transformation generates 5001 rows and send them to a "Sort rows" step with 5000 rows in memory and "pass only unique rows" checked. In 1 second, you should be able to get a Java Heap Space OutOfMemoryError.
    There is something there that I do not understand. Can someone enlight me ?

    @Matt : It seems to be a good idea to rethink the "Sort rows" step, thanks again for what you and your team are doing with PDI.

    @pmalves : You're right, it's an option. I was thinking until now that selecting the "pass only unique rows" option would allow me not to use a "Unique Rows" step behind my "Sort" step. After reading the "balloon" that appears when mouse cursor stands over the option, I do not know anymore if "pass only unique rows" have the same behaviour than chaining a "Sort" step and an "Unique Rows" step.


    a+,=)
    -=Clément=-

    EDIT : Without understanding why the attached sample is now working well.
    Please standby for further test results.
    Attached Files Attached Files
    Last edited by clement; 11-28-2007 at 10:38 AM. Reason: forgot the attached file

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Using your sample...
    My Spoon instance with 2048M allocated can easily sort 10M rows in memory.
    Since I have the memory statistics here with me, it only used around 30% of the 2048M = 620M.

    Suffice it to say I didn't see an Out of memory.

    Matt

  9. #9
    Join Date
    Jan 2008
    Posts
    16

    Default

    I am running into a similar issue with Heap space and would like to know how to set the xmx parameter from within Spoon.

    I don't see any mention of 'heap' or 'xmx' in the 'Spoon User Guide'... Also the feature added as result of PDI-516 is not documented in the Spoon user guide.

    Thanks,
    Shane

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Thumbs up How to set the maximum memory size

    It depends on the situation:

    1) if you run on the Pentaho platform, you need to give JBoss or Tomcat or the container you are running on enough memory
    2) if you are running spoon/kitchen/pan using the provided shell scripts (*.sh/*.bat), change the -Xmx parameter in those shell scripts
    3) if you are running spoon 3.0.1 or later on Windows using the kettle.exe starter (default installer), you can change the -Xmx parameter in file Kettle.l4j.ini. If the file is not present, create it in the same directory as kettle.exe. In versions 3.0.2 or above, this is the content:

    -----------------------snip------------------------------------
    # JVM command line options, one per line.
    # To increase the max memory limit, change the -Xmx parameter.
    #
    # Don't forget : M is for mega-bytes, k is for kilobytes.
    # If you don't specify either it's bytes!!
    #
    -Xmx256M
    -----------------------snap------------------------------------

    4) On OSX I really don't have an idea yet if you use the spoon launcher, otherwise, see 2)

    As you can see, the value after the -Xmx parameter is the maximum heap size in memory.

    HTH,

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.