Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Sorted Merge - is it faster? What's the theory behind it?

  1. #1
    Join Date
    Aug 2008
    Posts
    563

    Default Sorted Merge - is it faster? What's the theory behind it?

    Hi,
    I'd like to understand the theory behind the sorted merge step.

    If I run 4 copies of a sort step, I understood that I have to use a Sorted Merge step (only 1 copy) to keep the whole data set sorted.

    Now my question: Doesn't the sorted merge step have to sort the combined data set again anyway... ? How does it help having the input data sets sorted? What exactly happens?

    Sorting my data sets is one of the most time consuming steps and I'd like to speed this up if somehow possible. I did one quick these with running 4 copies of a sort step followed by a sorted merge step (with a data set of 2 million rows), but in my scenario this actually took longer then the simple approach of using just one copy of a sort step. (maybe I have to do some more testing as well).

    Thanks.
    Best regards,
    Diethard
    ===============
    Visit my Pentaho blog which offers some tutorials mainly on Kettle, Report Designer and Mondrian
    ===============

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Sure, the "Sorted Merge" step has to compare rows to make sure the output is still sorted. However, that's done in a streaming fashion and we don't need to consider the whole data set.
    It was actually created to do clustered sorting on different machines, not on a single machine.
    On a single box it depends how many CPU cores you have available to do the sort too. On my dual-core I see a small performance advantage when I sort with 2 copies, not 4.

  3. #3
    Join Date
    Aug 2008
    Posts
    563

    Default

    Hi Matt,
    Thanks a lot for the clarification. It makes sense now. I'll do some more tests then.
    Best regards,
    Diethard
    Best regards,
    Diethard
    ===============
    Visit my Pentaho blog which offers some tutorials mainly on Kettle, Report Designer and Mondrian
    ===============

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.