US and Worldwide: +1 (866) 660-7555
Results 1 to 7 of 7

Thread: Stream lookup with Bigint generates huge CPU load

  1. #1
    Join Date
    Sep 2011
    Posts
    190

    Question Stream lookup with Bigint generates huge CPU load

    When I use the Stream lookup step with a (MySQL) bigint datatype the CPU load goes through the roof, much more so than with other datatypes.


    Is this normal? Apparently, it's very expensive to generate a hash value for a bigint?

  2. #2
    Join Date
    Nov 1999
    Posts
    442

    Default

    I assume you mean BigNumber (what is internally a BigDecimal).

    Have a look at http://azagorneanu.blogspot.com/2011...rformance.html
    there are numbers for tests of Double (2ms) against BigDecimal (309ms to 767ms), so your finding seems correct.

    Please let us know, if this is similar to what you found.

    Thanks,
    Jens

  3. #3
    Join Date
    Sep 2011
    Posts
    190

    Default

    Thank you very much for your reply.

    As a matter of fact, I was referring to BigInt; MySQL makes a distinction between an Int (32bit) and a BigInt (64bit) (http://dev.mysql.com/doc/refman/5.0/...ger-types.html).
    Although I would imagine a MySQL BigInt should be mapped to a Java long primitive datatype, apparently this isn't the case resulting in a similar performance penalty as you described in Double vs BigDecimal.

  4. #4
    Join Date
    Nov 1999
    Posts
    442

    Default

    It simply depends if it is signed or not :-)

    A Java long is always signed, so a MySQL bigint signed will fit into this. But an unsigned will not fit into this, this is why we map to a BigDecimal / BigNumber (see http://jira.pentaho.com/browse/PDI-316).
    When you change your DDL to a signed, it will be mapped to Integer / Long.
    If this is not an option and you are sure it will not get into an overflow, you may change the data type within the transformation.

    HTH,
    Jens

  5. #5
    Join Date
    Sep 2011
    Posts
    190

    Default

    Thank you so much Jens, I will look into it!

  6. #6
    Join Date
    Nov 1999
    Posts
    442

    Default

    It would be great to get feedback about the performance differences you got with the number of rows.

    Thanks,
    Jens

  7. #7
    Join Date
    Sep 2011
    Posts
    190

    Default

    I would love to give you feedback about the performance difference but so far I haven't got any consistent results.

    The flowrate starts high but deteriorates quickly, perhaps due to memory limitations. After that the flowrate varies wildly.
    I've tried to forcibly CAST both the main flow and lookup flow to either a signed or unsigned but I can't tell any difference in performance. Then there's the fact that the MySQL CAST/Convert is very limited, for example I can't tell nor specify whether it generates a 32 bit int or 64 bit long :-)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •