Hitachi Vantara Pentaho Community Forums
Results 1 to 15 of 15

Thread: Job entries flow

  1. #1

    Default Job entries flow

    Hi!

    I often see a good explanation on how transformation flows/streams work, as it is the most relevant question.
    But what about jobs? I know they are run "sequential" and that they use some kind of backtracking algorithm, but I am not able of finding info in wiki about flow logic

    - How are logical paths choosen (backtrack alogirthm)? If a transformation has 2 ok hops, which one is first executed (are them supposedly sequential*)?
    - What happens if two paths converge in the same job_entry/transformation? Is there any concurrency check for the transformation will be executed twice? Will it act as a FIFO queue instead?
    - Is there any way to force a transformation entry in a job act similar to transformation's Block Before Step Executed step?
    - Is there any way to mark a job entry as an "only execute once (first/last ok incoming flow that reaches the box in runtime)"
    - How can I emulate conditional logic on multiple incoming flows? Like if this is ok(green) and that is wrong then next step is A but if both are wrong execute that other transformaiton.
    - What's the difference between 2 drag and drops of a transformation and a shallow copy? Does it act differently than pointing 2 control flows to the same box?

    *It seem they are according to some forum posts, but then I found something like this http://forums.pentaho.com/showthread...ce-in-chef-job


    Thank you very much!
    I'm sorry for asking too much!

    P.S: Any link to a wiki page or forum post that addresses that is enough for me, I'm not finding either.
    Last edited by bizintreader; 04-15-2014 at 03:45 AM.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    While my own Kettle jobs tend to be very simple, I will try to answer some of your questions as best as I can:

    Q: How are logical paths chosen?
    A: Next entries to execute are determined by matching the current result with every outgoing hop type: unconditional, true, false.

    Q: If a transformation has 2 ok hops, which one is first executed (are them supposedly sequential*)?
    A: All entries qualifying for execution are executed sequentially in the order returned by the DOM parser, that's looking for the hops in the kjb file. Entries are started in parallel threads if you enable that entry option.

    Q: What happens if two paths converge in the same job_entry - Is there any concurrency check for the transformation or will be executed twice?
    A: There currently is no concurrency check provided by the framework, you must do it yourself, as a dialog tells you when enabling parallel execution.

    Q: Is there any way to force a transformation entry in a job to act similar to transformation's Block Before Step Executed step?
    A: Not sure if I get you right, but in absence of parallel execution, isn't every entry behaving like that?

    Q: Is there any way to mark a job entry as an "only execute once (first/last ok incoming flow that reaches the box in runtime)"
    A: You must implement this yourself.

    Q: How can I emulate conditional logic on multiple incoming flows?
    A: Perhaps in a JavaScript entry?

    Mind you, as complexity in your Kettle job rises, maintainability declines.
    While it may be fun to discuss the inner workings of the Kettle framework from a technical point of view, we never should allow hard won insight to creep in our job designs.
    We should restrict ourselves to the advertised features.
    You still can try to influence the documented feature set by winning the community over and launch requests to the developers, though.
    Last edited by marabu; 04-15-2014 at 10:48 AM. Reason: typos
    So long, and thanks for all the fish.

  3. #3

    Default

    Thank You Marabu!!
    Pretty useful insight!

    Quote Originally Posted by marabu View Post
    Q: Is there any way to force a transformation entry in a job to act similar to transformation's Block Before Step Executed step?
    A: Not sure if I get you right, but in absence of parallel execution, isn't every entry behaving like that?
    I meant of course within parallel execution context, that would be useful.


    Thank you again!!

  4. #4

    Default

    Regarding the no concurrency check, I assume that means job would be executed twice (2 instances) and not queued.
    Is it safe at least as long as no common resources are used (for instance writing a file or CRUD in a db) ?

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    So it's still parallel execution we are talking about?
    Yes, entries - and their subentries, too - will run without being synchronized by the framework, then.
    So long, and thanks for all the fish.

  6. #6

    Default

    Sorry for being a bit overwhelming, but I am not sure I explained it well.
    I understand that the parallel execution of two resource-related job entries is not safe.

    But what if two different concurrent/parallel paths end up in the same box?
    - Is it executed twice (two concurrent instances) or are the requests queued?
    - If it is executed twice, is it safe as long as you are not using shared resources between instances? For instance, I could have a job entry that writes a random temp file. What if that is hit by two flows coming from parallel started actions?

    EDIT: I know within transformations all incoming rows are combined on-the-fly unless we use the step Prioritize Streams, but I am not sure of the default behaviour within Jobs, as they are sequentially-natured.


    Thanks
    Last edited by bizintreader; 04-15-2014 at 12:34 PM.

  7. #7
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    A job entry that is part of parallel execution pathes will be instantiated for each path.
    Depending on the total run time of such an entry two or more concurrently running instances can be observed.
    I wouldn't expect problems as long as common sense is applied.
    So long, and thanks for all the fish.

  8. #8

    Default

    Quote Originally Posted by marabu View Post
    A job entry that is part of parallel execution pathes will be instantiated for each path.
    Depending on the total run time of such an entry two or more concurrently running instances can be observed.
    I wouldn't expect problems as long as common sense is applied.
    Thanks as always, Marabu. Pretty clear!

  9. #9
    Join Date
    Apr 2014
    Posts
    6

    Default

    Job Image.doc

    Can anybody explain the flow of job present the attached document.
    Is it parallel run or sequential..??
    Can explain step by step.

    Thank you,
    Satya.

  10. #10

    Default

    Quote Originally Posted by satya1314 View Post
    Job Image.doc

    Can anybody explain the flow of job present the attached document.
    Is it parallel run or sequential..??
    Can explain step by step.

    Thank you,
    Satya.
    That's a very good question!
    I always assumed success step was some kind of transactional end for the job, I did not even never wonder whether can be used as an outgoing flow!

    By the way, how do you indicate the desired delay two each transformation without the Wait For step?

  11. #11
    Join Date
    Apr 2014
    Posts
    6

    Default

    Thank you for reply

  12. #12

    Default

    I found a situation where traverse algorithm is pretty important to be determined.

    If you execute a transformation in a "results to params + execute for every row" fashion I observed next path entries are not executed for every row. I mean T1 --- T2 (x2) -- T3. T3 is not executed twice in that example.
    I did not find a way to propagate variables in the flow, so I need to getVars + setOutputRows in T2. The problem is that only the rows from last execution reach T3, and so, T3 is only executed once even if you check "execute for every row".

    To solve that particular situation, I had to connect T3 to T1 and execute for every row.

    T1--T2 (x2)
    --T3 (x2)

    But I need to make sure T2 is executed first, so I rely on the logical order of the DOM planner to expect a good success. THAT is very important, as in case I decided to execute T3 before T2, I would need to delete hops and redraw them in my expected order for the job executor to store my desired order.

    Am I missing something?

  13. #13
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    I've had to do similar - carrying output rows from one transformation across another

    You can use the get rows from results and join them with your output, right before you copy rows to results.

    Another thing to remember is that you can nest jobs:

    Job1
    - Job 2
    - - Transform1
    - - Transform2
    - Job 3
    - - Transform 3
    - - Transform 4

    Transform 1 and Transform 2 can be running in parallel, and only when both are complete will Transform 3 and Transform 4 be run.

  14. #14
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Also -- you should try to keep one topic on one thread.
    http://forums.pentaho.com/showthread...95-Flow-of-Job
    Seems to be almost duplicated in this thread.

  15. #15

    Default

    Quote Originally Posted by gutlez View Post
    Also -- you should try to keep one topic on one thread.
    http://forums.pentaho.com/showthread...95-Flow-of-Job
    Seems to be almost duplicated in this thread.
    I'm sorry, it is true it all ended addressing same issue.

    Quote Originally Posted by gutlez View Post
    I've had to do similar - carrying output rows from one transformation across another
    You can use the get rows from results and join them with your output, right before you copy rows to results.
    Oh, man. That's true. I guess that getRowsFromResults will provide all incoming rows even whereas the transformation itself is executed for each one. I think I visualized the process as a foreach loop for every single row present, that is why i did not figure out that approach.

    Another thing to remember is that you can nest jobs:
    True, I forgot that! Once again obvious approach indeed..

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.