Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Number and Decimal Accuracy

  1. #1
    Join Date
    Mar 2013
    Posts
    5

    Default Number and Decimal Accuracy

    After searching the forum, and Jira, I can't seem to find an answer, explanation or bug relating to what I'm experiencing, so hoping someone can either explain this situation, or confirm it is a bug, which I'll raise in Jira.

    I have a transformation which loads a csv into an Oracle table, though I've found the target is irrelevant as it is the handling of the data that seems to be the problem. I have a numeric field in the source file, which depending on the length of the number, seems to lose accuracy.

    Here is the output of the run of the test I have made, the same input is going into both fields, however as can be seen, the output which had been defined as a number, varies as the length of the number reduces). I have tried any number of combinations of formats, lengths, precisions and setting the decimal delimiter, but none achieve what I would expect, and that is to pass through the number as it is stored. It is not "just a format issue" writing to the log, as I have also been inserting into a table, which always matches the displayed values.

    2013/12/05 15:06:08 - Write to log.0 - ------------> Linenr 1------------------------------
    2013/12/05 15:06:08 - Write to log.0 - num_input = 00000000000000000098760614115342832.000 Rounded?
    2013/12/05 15:06:08 - Write to log.0 - str_input = 98760614115342826.123
    2013/12/05 15:06:08 - Write to log.0 - ------------> Linenr 2------------------------------
    2013/12/05 15:06:08 - Write to log.0 - num_input = 00000000000000000000760614115342826.100 Rounded/Truncated?
    2013/12/05 15:06:08 - Write to log.0 - str_input = 760614115342826.123
    2013/12/05 15:06:08 - Write to log.0 - ------------> Linenr 3------------------------------
    2013/12/05 15:06:08 - Write to log.0 - num_input = 00000000000000000000060614115342826.125 Rounded?
    2013/12/05 15:06:08 - Write to log.0 - str_input = 60614115342826.123
    2013/12/05 15:06:08 - Write to log.0 - ------------> Linenr 4------------------------------
    2013/12/05 15:06:08 - Write to log.0 - num_input = 00000000000000000000003614115342826.123 Accurate
    2013/12/05 15:06:08 - Write to log.0 - str_input = 3614115342826.123

    The results don't seem particularly predictable... I've attached the transformation and input file to illustrate the definition.

    If this is a restriction, I think it should be documented somewhere, ironically, if it is left as a string, it inserts OK. I'd rather tie the types down though as it maintains an audit of the different data types being mapped through. I don't like the idea of it changing the content with no warning, it isn't like it is an invalid number as found in other posts, where the parser just gives up.

    Any suggestions welcome.
    Cheers, Will
    Attached Files Attached Files

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by wnewham View Post
    If this is a restriction, I think it should be documented somewhere
    So it is: A Kettle Number is a Java Double and a Double is good for roughly 15 decimal digits without loss of precision.

    Use BigDecimal, if you need higher than Double precision.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Apr 2012
    Posts
    253

    Default

    Watch out for Java class transforms with BigDecimal. I had quite a bit of trouble trying to convert the numbers when passed in. It could have been a 4.4 bug, unsure. I usually just left it as a string

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Non-integer arithmetic is so weird, a lot of people struggle with it for a lifetime.
    As far as accuracy is concerned, the only thing you should avoid is conversion between BigDecimal and Double (floating point types in general).

    Some links:

    So long, and thanks for all the fish.

  5. #5
    Join Date
    Mar 2013
    Posts
    5

    Default

    Thanks marabu and flamierd, both suggestions and warnings very welcome.

    Not being a Java person, I wasn't familiar with the differences and subtleties of the options within Java for dealing with numbers.

    I've changed the test transformation, from specifying the input as a Number, to BigNumber, and the results are out:

    2013/12/06 09:47:25 - Write to log.0 - ------------> Linenr 1------------------------------
    2013/12/06 09:47:25 - Write to log.0 - num_input = 00000000000000000098760614115342826.123 Accurate!
    2013/12/06 09:47:25 - Write to log.0 - str_input = 98760614115342826.123
    2013/12/06 09:47:25 - Write to log.0 - ------------> Linenr 2------------------------------
    2013/12/06 09:47:25 - Write to log.0 - num_input = 00000000000000000000760614115342826.123 Accurate!
    2013/12/06 09:47:25 - Write to log.0 - str_input = 760614115342826.123
    2013/12/06 09:47:25 - Write to log.0 - ------------> Linenr 3------------------------------
    2013/12/06 09:47:25 - Write to log.0 - num_input = 00000000000000000000060614115342826.123 Accurate!
    2013/12/06 09:47:25 - Write to log.0 - str_input = 60614115342826.123
    2013/12/06 09:47:25 - Write to log.0 - ------------> Linenr 4------------------------------
    2013/12/06 09:47:25 - Write to log.0 - num_input = 00000000000000000000003614115342826.123 Accurate!
    2013/12/06 09:47:25 - Write to log.0 - str_input = 3614115342826.123

    Fantastic!

    I'll look at the links to try and appreciate the differences, it seems it helps to get a little background to understand these fundamentals.

    I don't know who updates the documentation, but it might have helped me if there was a link to the data types definition (A Kettle Number is a Java Double) in the Text Input and CSV Input, against the Type field property, on the other hand, I still might not have put 2 and 2.12345667 together and got the right answer!

    Thanks again for the help.
    Will

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.