Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: how to lock and increment a variable ot prevent data race

  1. #1
    Join Date
    Feb 2017
    Posts
    23

    Default how to lock and increment a variable ot prevent data race

    I have a step that I call with 10 number of copies. With each result, I'd like to increment a variable.

    Is it possible to somehow create a transformation specific lock to use when each result is incrementing the variable?

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Use a DB Sequence?

    What you're talking about doing is a mutex, which means you need the entire flow to go through a single point -- not through 10 parallel points -- each of the 10 waiting their turn. In the PDI world, the easiest way to do that is using a DB Sequence. You could probably use an in-memory sequence to avoid the DB round-trip, but I'm not 100% sure that it's parallel copies safe.

    Remember: In PDI Variables are constant in each run of a transform.

  3. #3
    Join Date
    Jul 2009
    Posts
    476

    Default

    In a similar situation, I use the "Create file" and "Delete file" steps in a parent job. The "Create file" step creates a file that all of the copies of the parent job can see, and the "Fail if file exists" checkbox is checked. If the "Create file" step succeeds, then it goes on to call another job that I want to run serially. After that one-at-a-time job finishes, the parent job deletes the file.

    If the "Create file" step fails, then it goes to a "Wait for" step and waits for 60 seconds, then to a Dummy step, and back to the "Create file" step again.

    So if two or more copies of the parent job are running simultaneously, one of them will be able to create the "lock" file first, move into the serial job, and do its work. The other copy or copies of the parent job will fail to create the "lock" file and spin in the wait loop until the first parent job finishes the serial job and deletes the lock file, then one of the other copies of the parent job can create the lock file, run the serial job, and so on.

    All of the copies of this job are running on the same ETL server, so they all try to create the lock file in the same directory on the same server. If the job copies were running on different servers, then I would have to try something different, which would probably involve a database table.

    You mentioned incrementing a variable, which is slightly different than what I do here, so what I do might not quite fit your requirements.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.