Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: 3.0 : Sorting rows : turn off compression!

  1. #1
    Matt Casters Guest

    Default 3.0 : Sorting rows : turn off compression!

    Hi Devs,

    From the tests below you can clearly see that turning off compression in the
    "Sort Rows" step, can dramatically improve performance.
    This is especially the case if you're already running out of CPU cycles.
    The theory to limit I/O by using compression breaks down because you are
    usually not I/O bound but CPU bound.
    It's the case for both the V2 and V3 tests below as I'm running constantly
    at 100% CPU on my trusted old laptop.

    The 3.0 architecture again shows great improvements in those cases where we
    still have some juice left in the CPU.

    The properties: sortedDescending & caseInsensitive where added to the
    metadata so that the next steps can know that the data is sorted on a
    certain key.
    We need to figure out rules to clear these properties again if we're joining
    for example, but we're making the metadata richer already.

    Support for case insensitive sorting was added to the API, not yet to the
    Sort step itself as I want to keep the functionality the same as V2.5 for
    the time being.

    All the best,

    Matt
    ____________________________________________
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org <http://www.pentaho.org/> -- mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37


    Name of transformation: Sort table data
    Transformation description: Sorts data from a database table on Name,
    Firstname
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 30,28s, speed: 3.302 rows/s
    V3 results, rows: 100.000, runtime: 24,95s, speed: 4.008 rows/s
    V3 / V2 = x1,21

    Name of transformation: Sort table data no compression
    Transformation description: Sorts data from a database table on Name,
    Firstname, doesn't use compression on temp files
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 9,86s, speed: 10.143 rows/s
    V3 results, rows: 100.000, runtime: 6,84s, speed: 14.611 rows/s
    V3 / V2 = x1,44

    Name of transformation: Sort table data (MySQL)
    Transformation description: Sorts data from a remote database table on Name,
    Firstname
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 28,91s, speed: 3.459 rows/s
    V3 results, rows: 100.000, runtime: 25,08s, speed: 3.988 rows/s
    V3 / V2 = x1,15

    Name of transformation: Sort table data no compression (MySQL)
    Transformation description: Sorts data from a remote database table on Name,
    Firstname, no compression on temp files
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 7,81s, speed: 12.799 rows/s
    V3 results, rows: 100.000, runtime: 4,48s, speed: 22.302 rows/s
    V3 / V2 = x1,74

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    Biswapesh Chattopadhyay Guest

    Default Re: 3.0 : Sorting rows : turn off compression!

    Matt,

    In tests that I did a couple of months back, compression has an adverse
    effect for small row sizes and data volumes, but becomes really effective
    when row sizes are large (250 bytes +) and number of rows grows beyond a
    hundred thousand or so. Admittedly, I use a dual CPU 3 GHz Intel 64 bit
    processor though :-)

    May you try such a data set and see what impact it has on sort performance?
    It is possible that things have improved and compression no longer has
    benefits.

    Rgds,
    Biswa.


    On 15/05/07, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >
    > Hi Devs,
    >
    > From the tests below you can clearly see that turning off compression in
    > the "Sort Rows" step, can dramatically improve performance.
    > This is especially the case if you're already running out of CPU cycles.
    > The theory to limit I/O by using compression breaks down because you are
    > usually not I/O bound but CPU bound.
    > It's the case for both the V2 and V3 tests below as I'm running constantly
    > at 100% CPU on my trusted old laptop.
    >
    > The 3.0 architecture again shows great improvements in those cases where
    > we still have some juice left in the CPU.
    >
    > The properties: sortedDescending & caseInsensitive where added to the
    > metadata so that the next steps can know that the data is sorted on a
    > certain key.
    > We need to figure out rules to clear these properties again if we're
    > joining for example, but we're making the metadata richer already.
    >
    > Support for case insensitive sorting was added to the API, not yet to the
    > Sort step itself as I want to keep the functionality the same as V2.5 for
    > the time being.
    >
    > All the best,
    >
    > Matt
    > ____________________________________________
    > Matt Casters, Chief Data Integration
    > Pentaho, Open Source Business Intelligence
    > http://www.pentaho.org -- mcasters (AT) pentaho (DOT) org
    > Tel. +32 (0) 486 97 29 37
    >
    > Name of transformation: Sort table data
    > Transformation description: Sorts data from a database table on Name,
    > Firstname
    >
    > -----------------------------------------------------------------------------------------------------
    > V2 results, rows: 100.000, runtime: 30,28s, speed: 3.302 rows/s
    > V3 results, rows: 100.000, runtime: 24,95s, speed: 4.008 rows/s
    > V3 / V2 = x1,21
    >
    > Name of transformation: Sort table data no compression
    > Transformation description: Sorts data from a database table on Name,
    > Firstname, doesn't use compression on temp files
    >
    > -----------------------------------------------------------------------------------------------------
    > V2 results, rows: 100.000, runtime: 9,86s, speed: 10.143 rows/s
    > V3 results, rows: 100.000, runtime: 6,84s, speed: 14.611 rows/s
    > V3 / V2 = x1,44
    >
    > Name of transformation: Sort table data (MySQL)
    > Transformation description: Sorts data from a remote database table on
    > Name, Firstname
    >
    > -----------------------------------------------------------------------------------------------------
    > V2 results, rows: 100.000, runtime: 28,91s, speed: 3.459 rows/s
    > V3 results, rows: 100.000, runtime: 25,08s, speed: 3.988 rows/s
    > V3 / V2 = x1,15
    >
    > Name of transformation: Sort table data no compression (MySQL)
    > Transformation description: Sorts data from a remote database table on
    > Name, Firstname, no compression on temp files
    >
    > -----------------------------------------------------------------------------------------------------
    > V2 results, rows: 100.000, runtime: 7,81s, speed: 12.799 rows/s
    > V3 results, rows: 100.000, runtime: 4,48s, speed: 22.302 rows/s
    > V3 / V2 = x1,74
    >
    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    Matt Casters Guest

    Default RE: 3.0 : Sorting rows : turn off compression!

    Hi Biswa,

    I've seen both extremes: very fast CPUs on slower disks and very fast disks
    on very fast CPU and other balanced systems.
    All in all I'm in your camp and I think that the setting has a use-case.

    I didn't expect the performance impact to be this big though. 3 times
    slower is a "dramatic" performance loss.
    When people claim that the sort algorithm in 2.4 is very slow I'm sure that
    in a lot of cases they've hit this problem.

    Hence the warning in the title. ;-)

    All the best,

    Matt


    _____

    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Biswapesh
    Chattopadhyay
    Sent: Tuesday, May 15, 2007 11:50 AM
    To: kettle-developers (AT) googlegroups (DOT) com
    Subject: Re: 3.0 : Sorting rows : turn off compression!


    Matt,

    In tests that I did a couple of months back, compression has an adverse
    effect for small row sizes and data volumes, but becomes really effective
    when row sizes are large (250 bytes +) and number of rows grows beyond a
    hundred thousand or so. Admittedly, I use a dual CPU 3 GHz Intel 64 bit
    processor though :-)

    May you try such a data set and see what impact it has on sort performance?
    It is possible that things have improved and compression no longer has
    benefits.

    Rgds,
    Biswa.



    On 15/05/07, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:

    Hi Devs,

    From the tests below you can clearly see that turning off compression in the
    "Sort Rows" step, can dramatically improve performance.
    This is especially the case if you're already running out of CPU cycles.
    The theory to limit I/O by using compression breaks down because you are
    usually not I/O bound but CPU bound.
    It's the case for both the V2 and V3 tests below as I'm running constantly
    at 100% CPU on my trusted old laptop.

    The 3.0 architecture again shows great improvements in those cases where we
    still have some juice left in the CPU.

    The properties: sortedDescending & caseInsensitive where added to the
    metadata so that the next steps can know that the data is sorted on a
    certain key.
    We need to figure out rules to clear these properties again if we're joining
    for example, but we're making the metadata richer already.

    Support for case insensitive sorting was added to the API, not yet to the
    Sort step itself as I want to keep the functionality the same as V2.5 for
    the time being.

    All the best,


    Matt
    ____________________________________________
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org <http://www.pentaho.org/> -- mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37


    Name of transformation: Sort table data
    Transformation description: Sorts data from a database table on Name,
    Firstname
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 30,28s, speed: 3.302 rows/s
    V3 results, rows: 100.000, runtime: 24,95s, speed: 4.008 rows/s
    V3 / V2 = x1,21

    Name of transformation: Sort table data no compression
    Transformation description: Sorts data from a database table on Name,
    Firstname, doesn't use compression on temp files
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 9,86s, speed: 10.143 rows/s
    V3 results, rows: 100.000, runtime: 6,84s, speed: 14.611 rows/s
    V3 / V2 = x1,44

    Name of transformation: Sort table data (MySQL)
    Transformation description: Sorts data from a remote database table on Name,
    Firstname
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 28,91s, speed: 3.459 rows/s
    V3 results, rows: 100.000, runtime: 25,08s, speed: 3.988 rows/s
    V3 / V2 = x1,15


    Name of transformation: Sort table data no compression (MySQL)
    Transformation description: Sorts data from a remote database table on Name,
    Firstname, no compression on temp files
    ----------------------------------------------------------------------------
    -------------------------
    V2 results, rows: 100.000, runtime: 7,81s, speed: 12.799 rows/s
    V3 results, rows: 100.000, runtime: 4,48s, speed: 22.302 rows/s
    V3 / V2 = x1,74









    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.