Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: PDI 3.0 Variables - better concurrent support?

  1. #1

    Default PDI 3.0 Variables - better concurrent support?

    Hey all,
    I had previously asked how well Kettle supported running concurrent jobs of the same type, specifically with regards to variables (as, at that time, I was having variable issues). This was back during 2.3.

    Clarify: Yes, I am using 'Variable Scope in Job', not JVM and not ROOT_JOB. Only one job without any sub-jobs, just transformations using those variables from the Job level, and variables set in transformations to the job level. Running a job by itself does not cause problems.

    Now, with 2.5.0, I have definately experienced Variable issues in high-concurrent environments. The specific scenario seems to be a job with multiple transformations, and running that one job with different parameters/variables in a high-concurrent environment, will cause the variables to over-write each other.

    My guess, based on the behaviour, may have to do with the transformations running into each other in some fashion or getting confused which is their parent job(s). I have even had during a table-output step it cause primary-key-violations when a retrieved key suddenly, half-way through one of the concurrent jobs, starts using a different jobs primary key and try to populate. Many other, similar problems that are difficult to reproduce reliably, but with a multi-thread test environment at least create problems.

    In 2.5.0, there IS a problem, no doubt, but creating a sample use-case is difficult as you can not say running this will cause an expected problem, because it is sporadic when variables will start over-writing each other. However -- it seems to be all or nothing, not just one variable at a time.

    Question is: will the changes in 3.0 better support concurrent processing of the same job file with different parameters/variables values?

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Oh yeah... well I hope so. I spent 2 weeks fixing things in v3.0 for that. So give it spin and let me/us know whether it's better... or whether you find somewhere a bug because of it.

    Variables do work in v2.5 but in some cases they fail. Easiest to fake is to make at least 2 layers of subjobs (so a job running a subjob running a transformation) or a step with multiple copies (and then using the internal copy number variable while the step is running). In v2.5 and earlier variables were stored globally in a Map with as key the threadname and sometimes Kettle got confused in which threadname to use, that was also the reason for a small memory leakage. If it doesn't find a thread it will make a new environment (leaving the old one orphaned)

    In v3.0 variables are put on steps and job entries themselves (should be similar to running sub-shells on UNIX... except for the "upward propagation/exporting" of variables of course). If the step/job entry disappears, the variables go along. And qua "inheritance" it should be cleaner. If you want to dig deeper have a look at the set variable step source code.
    Also changed because of "brassrat" is that the modified javascript has a proper interface to set and get variables.

    Regards,
    Sven

  3. #3

    Default

    Will give it a go -- Thanks for the prompt and concise response!!

  4. #4

    Default

    Some difficulty testing this from a BI Platform environment - the KettleComponent needs to be updated to reflect the new package naming.

    Also, it looks like KettleComponent (well, KettleSystemListener) has references to LocalVariable which, best guess, is no longer valid in 3.0.

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Tracker created. I mostly use PDI as standalone ETL.

    I guess it will depend which PDI version will ship with the next BI platform version, for when it needs to be changed.

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.