Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Sort performance / crash

  1. #1
    Join Date
    Sep 2007

    Default Sort performance / crash

    I need to sort about 4 Million rows (before merge rows). I tried to use Sort step but without success. What happend:

    Sort step, in memory 5000 rows (default). When sort starts, after reading 5000 rows there is temp file written on disk, size 1.7 MB. Sorting continues until about 130'000 rows, when memory consumption reaches VM heap size, i.e. about 1 GB (-Xmx1024m). Size of temp file does not change anymore. When memory is exhausted, Spoon continues consuming CPU for a while without reading/writing rows or without touching the temp file, than the process crashes.

    What I am doing wrong? I mean sorting 10^7 rows is quite usual in ETL staging. Currently I break my data stream, I write the rows into the stage table and I am reading the rows in next transformation using Table input step with 'order by' select statement.

  2. #2
    Join Date
    Nov 1999


    Hi Robert,

    Certainly, for our test-cases, 10M rows is indeed a puny figure.
    Sorting them all in memory however, is not something that I would recommend unless you have a 64-bit JVM and a LOT of memory to spare.

    To get this going set the sort buffer size to something realistic like 50.000. If you can't fit all rows in memory, it doesn't matter too much how many will fit in memory since you'll have to spool it all to disk anyway.
    To speed it up, you can do all kinds of funky things like sorting in parallel locally (if you have multiple CPU in your box) or in cluster (if you have a few servers to spare), but that's the main thing.



    P.S. Make sure you are using 2.5.2 or 3.0.0 since we have been seriously optimizing the sort step lately.

  3. #3
    Join Date
    Sep 2007


    Thanks, now it works better. I think enlarging the sort buffer size from 5000 to 50000 was the right hit.


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.