Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: 3.0 : Calculator

  1. #1
    Matt Casters Guest

    Default 3.0 : Calculator

    (Benchmarking 3.0 follow-up)

    If there is any step that we would expect to do far less good than in V2.5
    it is the calculator.
    Because we're adding fields, we can't make use of a List and as such we have
    to copy the whole Object[] for each row.

    Here are the raw numbers, read 'm and weep. (1.000.000 iterations)


    Test1: C=A+B
    OLD: 6.8 s - 147,124 rows/s
    NEW: 2.6 s - 383,288 rows/s (x2.6)

    Test2: Test1 and 2 extra number fields (4 fields, 1 calculation)
    OLD: 11.0 s - 91,166 rows/s
    NEW: 2.6 s - 387,898 rows/s (x4.2)

    Test3: Test2 and F=D/E
    OLD: 14.9 s - 67,083 rows/s
    NEW: 3.0 s - 336,813 rows/s (x5.0)

    Test4: Test3 and I=G+2 days
    OLD: 24.5 s - 40,868 rows/s
    NEW: 7.5 s - 132,784 rows/s (x3.2)

    Test5: Test4 and L=J||J (Boolean or using A+B)
    OLD: 30.9 s - 32,324 rows/s
    NEW: 8.2 s - 122,180 rows/s (x3.8)


    One conclusion might be that that barring "expensive" date operations the
    difference with the old step is smaller if less fields get added.
    RowGenerator is a lot faster in the new version so it makes up for it.
    So I tried the Generator with a dummy replacing the calculator:

    Test6: Test1, dummy replacing calculator
    OLD: 4.1 s - 241.488 rows/s
    NEW: 2.4 s - 415.628 rows/s (x1.7)

    So that conclusion goes right out the window as well.

    Overall again, a lot of good news. Add Sequence is next to confirm these
    numbers.

    Have a great weekend,

    Matt
    ____________________________________________
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    <http://www.pentaho.org/> http://www.pentaho.org --
    <mailto:mcasters (AT) pentaho (DOT) org> mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    Tim Pigden Guest

    Default RE: 3.0 : Calculator

    Matt,

    That's excellent. I'm very impressed at how quickly you've been able to
    make these changes and propagate them to different parts of the system.



    Do you think the code has got simpler?



    Will it make custom steps any easier to write?



    Tim



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    Matt Casters Guest

    Default RE: 3.0 : Calculator

    Hi Tim,

    In certain case it makes it more complex, but in a lot of ways it's indeed
    easier.
    For example, since the data is no more mixed with the metadata, you can
    simply re-use StepMetaInterface.getFields() to know the output metadata.
    All you then need to do is generate the String, Long, Date, Integer, ... for
    the output row.

    So it's harder because you no longer have the tight coupling and so you have
    to watch out, but it's easier since there is far less code duplication.
    It's also harder to delete elements from a Object[] for example. However,
    I've written utility classes RowDataUtil and ValueDataUtil to help out
    there.
    There richer these utility classes become, the faster development goes.

    Also note that I'm not changing any V2.5 code, I'm copying everything to new
    packages. Since the API is not compatible I figured it makes sense to not
    even pretend that they are ;-)
    Copying and modifying piece by piece has the advantage that you can keep
    everything compiled and that you can test each part.

    All the best,

    Matt


    _____

    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Tim Pigden
    Sent: Saturday, May 12, 2007 3:02 AM
    To: kettle-developers (AT) googlegroups (DOT) com
    Subject: RE: 3.0 : Calculator



    Matt,

    That's excellent. I'm very impressed at how quickly you've been able to make
    these changes and propagate them to different parts of the system.



    Do you think the code has got simpler?



    Will it make custom steps any easier to write?



    Tim






    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.