Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Row Denormlaiser differences, 4.1 -> 4.2

  1. #1

    Default Row Denormlaiser differences, 4.1 -> 4.2

    First, let me say "Thank You" to Matt and the Pentaho team for their hard work on 4.2 It's great to see the Kettle project continuing to move forward!

    We have a transform that works fine in 4.1, but breaks in 4.2. Specifically, the "Row Denormaliser" step. In 4.1, our transform used a "Number" field as the key field. In 4.2, attempting to use a number field as a key field causes the denormaliser to fail - all the fields to be created by the step come up empty.

    The fix was to use a "Select Values" step to convert the key field to a String just before the denormaliser step (this thread was very helpful). This fix leaves me wondering if I've been mis-using the step all along (I know how important strongly-typed data is to Kettle), or if this was an unintended enhancement to the application.

    Again, thanks to Matt and team for another great release. We are very excited to move our projects into it!

    -Kevin

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I think it's dangerous to compare floating point values to each other.
    That being said, I would love to see the difference in behavior in a JIRA case so we can have a look at it.

  3. #3

    Default

    Thanks, Matt.

    As we continued work on that transform this afternoon, we discovered other problems. At one point, two streams which had been appending properly before were suddenly losing fields from the "tail" stream. We ran the same transform in 4.1 and 4.2, we got the same behavior, so we knew the problem was somewhere in the transform itself.

    We reverted back to an old copy and started over. We verified that the append was working with the old copy in 4.1, then copied it over to 4.2. This time, the denormaliser step worked properly right away with a number as the key field, and we were able to keep it functional that way through all of our changes.

    The only thing we can think of is that, before we started working on the copy of the transform that we had to replace, we pasted three steps from another transform into it. We've seen several instances where pasted steps don't behave quite right in 4.2. In one case, a pasted data grid step wasn't keeping its data after clicking OK. We deleted the step and replaced it with a fresh one, and it started working properly.

    We will post all three versions of the file to JIRA so you can have a look. It may take awhile, as we need to package up some data for you to see the behavior, as well. We'll throw a JIRA post in for the pasting behavior if we start to see it more often. We've also uncovered a strange behavior with the split fields step that we will post there, as well.

    Thanks again!

    -Kevin

  4. #4

    Default

    Just ventured over to JIRA and found I cannot post attachments there. I'll attach the transform files here if you think you'll be able to see anything that might be wrong inside them. Otherwise, I'll file the JIRA cases with as much detail as possible, and leave it at that.

    Thanks.

    -Kevin

  5. #5

    Default

    Hi, Matt.

    Sorry for all of this. We did some further troubleshooting of the transform, and found what may have been our own mistakes making it not work quite right (though we still can't figure out why it worked in 4.1, but caused us grief in 4.2). We simplified the sorting and grouping for the denormaliser step, and then took out the append, and just let Kettle combine the rows, and it looks like it is working again. We'll keep at it, and if we find anything repeatable, we'll post it here and to JIRA.

    Thanks again for a great product. We'll try to be smarter about posting next time. Delete this thread, if you like.

    -Kevin

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Make sure it's not a driver problem. We made the mistake of "upgrading" the MySQL JDBC driver to the latest 5.1.x version but that one has serious compatibility problems.

  7. #7

    Default

    Thanks for not deleting me, Matt.

    The transform is reading from an MSSQL data source, so it seems unlikely that the problem lies with the driver. However. In an attempt to show you what is going on, I created a sample data set in a text file, and created three versions of the transform to read from that file: the ORIGinal (which only works in 4.1), a BAD one and a NEW one.

    All three of them work flawlessly when run from the text file. However, when I return to the BAD transform reading from SQL, it fails when the streams are appended at the bottom of the second column of steps, no matter what step I use (Append Streams, Dummy Step, or the natural next step of the transform).

    My head is starting to hurt. How different could the SQL data be to cause a problem this dramatic so late in the transform?

    The attached file has all three of the above, samples of their output, and a screenshot of what we're seeing when we run the BAD file from the database, rather than from the text file.

    Any insight you can offer would go a long way to easing my headache. I will file any JIRA entries you like surrounding this (should I just reference this thread to get around the attachment limitation?), if you can tell me what I'm imagining (ie, where I'm cutting a corner and doing it "wrong"), and what's real.

    Thanks.

    -Kevin

    P.S. I'm CERTAIN there are things in here I'm doing "wrong" or could do "better". I wouldn't count myself among the elite users of your software, but I get by OK.
    Attached Files Attached Files

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Actually, there are subtle differences in the way that a floating point number is represented.
    A number that displays like 1.0 might actually be internally worth 0.99999999998 or 1.00000000001 if you catch my drift.
    Depending on the situation or your luck a DB driver might give back 1.0 or anything close to it.

    That's why they say that comparing floating point numbers is usually not a good idea. To test this, try to round the number before the de-normalizing and see what happens.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.