Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Occasional null values generated

  1. #1
    Join Date
    Apr 2007
    Posts
    15

    Default Occasional null values generated

    My team is working on an application that reads multiple CSV files and uses the information to populate some fact shared dimension tables. When we run the transformation, the fact tables are filled every time. Sometimes, though, Kettle generates null values for records in the dimension table. It seems to only happen when we run through Pan or through our own Groovy scripts and only happens about 50% of the time. Has anyone else seen this?

    We have 3 CSV inputs. They all contain one field called VMID. Our transformation looks like this.

    CSV -> Select VMID field -> Select unique VMIDs -> Javascript to parse the VMID and generate 3 values -> Insert/Update

    The Insert/Update fails about 50% of the time because it attempts to insert a null value for the ID field.

  2. #2
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi MrSqueezles (man, I envy that nick )

    usually when something happens 'some of the time' it has to do with concurrency effects. Kettle is no exception.
    Are you running several instances of any step or transformation? Are you running the CSV reading "in parallel"? I think there is an option to enable that. Running in parallel might have implications that are not entirely intuitive, as explained in this article.
    http://type-exit.org/adventures-with...ansformations/

    Maybe the javascript does something funny? I mean, is the generation of the values really stateless or does the execution depend on row order or previous values?

    What do the rest of the fields look like on the rows with null ids? This might give you a hint as to what went wrong...

    Can you come up with a minimal example that would just output the null ids to a csv? If you could post an example, I'd gladly look into the issue.

    Cheers

    Slawo

  3. #3
    Join Date
    Apr 2007
    Posts
    15

    Default

    Thank you for the suggestions and your offer to help. We assumed it was a concurrency issue, but weren't sure what could be causing it, assuming that any parallelism was not a risk, but a feature of Kettle.

    The Javascript blocks are stateless. They're just generating field values for the current record. We considered that they could be causing the problem, but even the "id" field is null. It's not generated by the scripts. It's read from the input files, so it exists from beginning to end. Or it should...

    We'll try making the process more serial and post any results. If we can't solve the problem that way, we'll generate some output files and post them too. I don't want to take up your time unless we reach another dead end. Thanks again for your help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.