1. Matt Casters Guest

## 3.0 : Calculator

(Benchmarking 3.0 follow-up)

If there is any step that we would expect to do far less good than in V2.5
it is the calculator.
Because we're adding fields, we can't make use of a List and as such we have
to copy the whole Object[] for each row.

Here are the raw numbers, read 'm and weep. (1.000.000 iterations)

Test1: C=A+B
OLD: 6.8 s - 147,124 rows/s
NEW: 2.6 s - 383,288 rows/s (x2.6)

Test2: Test1 and 2 extra number fields (4 fields, 1 calculation)
OLD: 11.0 s - 91,166 rows/s
NEW: 2.6 s - 387,898 rows/s (x4.2)

Test3: Test2 and F=D/E
OLD: 14.9 s - 67,083 rows/s
NEW: 3.0 s - 336,813 rows/s (x5.0)

Test4: Test3 and I=G+2 days
OLD: 24.5 s - 40,868 rows/s
NEW: 7.5 s - 132,784 rows/s (x3.2)

Test5: Test4 and L=J||J (Boolean or using A+B)
OLD: 30.9 s - 32,324 rows/s
NEW: 8.2 s - 122,180 rows/s (x3.8)

One conclusion might be that that barring "expensive" date operations the
difference with the old step is smaller if less fields get added.
RowGenerator is a lot faster in the new version so it makes up for it.
So I tried the Generator with a dummy replacing the calculator:

Test6: Test1, dummy replacing calculator
OLD: 4.1 s - 241.488 rows/s
NEW: 2.4 s - 415.628 rows/s (x1.7)

So that conclusion goes right out the window as well.

Overall again, a lot of good news. Add Sequence is next to confirm these
numbers.

Have a great weekend,

Matt
____________________________________________
Matt Casters, Chief Data Integration
<http://www.pentaho.org/> http://www.pentaho.org --
<mailto:mcasters (AT) pentaho (DOT) org> mcasters (AT) pentaho (DOT) org
Tel. +32 (0) 486 97 29 37

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

2. Tim Pigden Guest

## RE: 3.0 : Calculator

Matt,

That's excellent. I'm very impressed at how quickly you've been able to
make these changes and propagate them to different parts of the system.

Do you think the code has got simpler?

Will it make custom steps any easier to write?

Tim

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

3. Matt Casters Guest

## RE: 3.0 : Calculator

Hi Tim,

In certain case it makes it more complex, but in a lot of ways it's indeed
easier.
For example, since the data is no more mixed with the metadata, you can
simply re-use StepMetaInterface.getFields() to know the output metadata.
All you then need to do is generate the String, Long, Date, Integer, ... for
the output row.

So it's harder because you no longer have the tight coupling and so you have
to watch out, but it's easier since there is far less code duplication.
It's also harder to delete elements from a Object[] for example. However,
I've written utility classes RowDataUtil and ValueDataUtil to help out
there.
There richer these utility classes become, the faster development goes.

Also note that I'm not changing any V2.5 code, I'm copying everything to new
packages. Since the API is not compatible I figured it makes sense to not
even pretend that they are ;-)
Copying and modifying piece by piece has the advantage that you can keep
everything compiled and that you can test each part.

All the best,

Matt

_____

From: kettle-developers (AT) googlegroups (DOT) com
[mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Tim Pigden
Sent: Saturday, May 12, 2007 3:02 AM
To: kettle-developers (AT) googlegroups (DOT) com
Subject: RE: 3.0 : Calculator

Matt,

That's excellent. I'm very impressed at how quickly you've been able to make
these changes and propagate them to different parts of the system.

Do you think the code has got simpler?

Will it make custom steps any easier to write?

Tim

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•