Hitachi Vantara Pentaho Community Forums
Results 1 to 23 of 23

Thread: Java Heap Space OutOfMemoryError in a Sort step

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Mar 2007
    Posts
    216

    Smile Java Heap Space OutOfMemoryError in a Sort step

    Hi,

    I have a sort step that stops after reading 97 095 675 lines.
    The error message is :
    Code:
    2007/11/26 15:03:51 - Sort rows.0 - ERROR (version 3.0.0, build 500 from 2007/11/14 14:59:11) :     at java.io.BufferedInputStream.<init>(Unknown Source)
    (...) at org.pentaho.di.trans.steps.sort.SortRows.getBuffer(SortRows.java:206)
    (...) at org.pentaho.di.trans.steps.sort.SortRows.processRow(SortRows.java:370)
    (...) at org.pentaho.di.trans.steps.sort.SortRows.run(SortRows.java:503)
    I changed the -Xmx paramter from 256 to 512M but It stops after reading the same number of lines. The sort step have "Only pass unique rows" enabled and "Compress TMP Files" disabled. Should I change that ?

    a+, =)
    -=Clément=-

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Either give it even more memory or lower the amount of rows to sort at once. (in the sort step)

    Matt

  3. #3
    Join Date
    May 2006
    Posts
    4,882

    Default

    97.095.675 lines <sigh> ... although there's no hardcoded size limitation in sort step, I think you hit your memory constraint. You could try playing with the sort size but I would try to do the sorting via the database if possible.

    Regards,
    Sven

  4. #4
    Join Date
    Mar 2007
    Posts
    216

    Smile

    Hi,

    I have changed 512m to 1024m. I also changed the java io temp directory to a place it does can have it's required 10GB. I changed the pagefile.sys (windows swap file) to be 3.4GB large. I still have the error. I will now try with 5 000 instead of 10 000 'rows in memory' in the Sort step. I will see the result tomorrow as it takes 2 hours of cpu time. I would not use the database sorter as my goal is to pass only unique rows before insert.

    a+, =)
    -=Clément=-

  5. #5
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    Quote Originally Posted by clement View Post
    I would not use the database sorter as my goal is to pass only unique rows before insert.
    Isn't select distinct an option?
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    http://jira.pentaho.org/browse/PDI-516

    You should see a more "sustainable" solution in version 3.0.1.
    I'm running it with great success on my machine. I'll be able to commit the code it in a few days.

    Matt

  7. #7
    Join Date
    Mar 2007
    Posts
    216

    Smile

    Hi,

    Please see the attached file : this transformation generates 5001 rows and send them to a "Sort rows" step with 5000 rows in memory and "pass only unique rows" checked. In 1 second, you should be able to get a Java Heap Space OutOfMemoryError.
    There is something there that I do not understand. Can someone enlight me ?

    @Matt : It seems to be a good idea to rethink the "Sort rows" step, thanks again for what you and your team are doing with PDI.

    @pmalves : You're right, it's an option. I was thinking until now that selecting the "pass only unique rows" option would allow me not to use a "Unique Rows" step behind my "Sort" step. After reading the "balloon" that appears when mouse cursor stands over the option, I do not know anymore if "pass only unique rows" have the same behaviour than chaining a "Sort" step and an "Unique Rows" step.


    a+,=)
    -=Clément=-

    EDIT : Without understanding why the attached sample is now working well.
    Please standby for further test results.
    Attached Files Attached Files
    Last edited by clement; 11-28-2007 at 10:38 AM. Reason: forgot the attached file

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.