Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Backtracking in kettle jobs.

  1. #1
    Join Date
    Sep 2008
    Posts
    26

    Default Backtracking in kettle jobs.

    Hi all,

    Many threads in pentaho forums and pages in wiki talks about backtracking algorithm that is used in Kettle jobs. But I can't find anywhere an explanation what backtracking is used for in Kettle?

    I'm familiar with backtracking used in Prolog. It's a heart of Prolog, main idea of it. But in Kettle seems like nobody use this feature.

    Let's start with definition from wikipedia: "Backtracking is a general algorithm for finding all (or some) solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c ("backtracks") as soon as it determines that c cannot possibly be completed to a valid solution."

    My assumption for Kettle is that when we have a graph of jobs (inside an outer job) and one branch of the graph failed - we execute another branch. Or another idea is to pass rows from one transformation to another until second transformation will finish with success. But I believe nothing of these are in Kettle.

    I payed attention to backtracking because I want to organize a loop in a job, and I'm getting StackOverflowError when circle hops. In Prolog backtracking is a standard way for organizing loops so I've thought about trying it.

    Thanks,
    Vasili

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, in Jobs it means that all possible paths are tried until no more possibilities exist.

    There are examples of more efficient loops in the samples/jobs folder of your Kettle distribution.
    ( ... and in our upcoming book :-) )

  3. #3
    Join Date
    Sep 2008
    Posts
    26

    Default

    I understand that kettle allows to implement for-loop quite elegant. When you know in advance how many times you need to run a transformation.

    I have a kind of "outstanding" loop. It should work in a while-loop fashion, until some condition become true.
    I'm downloading some data using web servicies by chunks(by 10 rows). I don't know in advance how many rows of data is on the server. I'm making request for every 10 rows and when response returns an error insead of next portion of data I need to stop requesting for it.

    After looking at samples I've found an example that can help me, one called "JavaScriptMod - skip rows after x rows". It should help me to reduce recursion depth by sending several requests(10-100) in one transformation. But I still forced to use cycling hops approach.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Although I personally think that an external scheduler is best for these things, the "Start" job entry also is capable of looping indefinitely.

  5. #5
    Join Date
    Sep 2008
    Posts
    26

    Default

    Using Repeat flag of "Start" job entry really helped me. It keeps running until I intentionally failed it by some condition.

    Thanks Matt.

  6. #6
    Join Date
    Sep 2008
    Posts
    26

    Default

    This approach resembles me the backtracking in Prolog. The search is running until it find a solution. The difference is only that that process stops when it failed in Kettle, and in Prolog you can set it to stop after finding first solution or allow to find all solutions(which possible only with closed world assumption).

    Having a special job entry that controls when to stop a repeating job that exists without failing the job could make this approach more consistent.

  7. #7
    Join Date
    Sep 2008
    Posts
    26

    Default

    Matt,

    I have developed a simple job step that stops repeating job. It's also possible to stop repeating job with Abort step, but Abort step also fails the job and write error to log:
    ERROR 30-07 19:02:06,996 - Abort job.

    StopJob.zip

    New job step stops the job and reports that it finished successfully. I would say that this plugin 'legalizes' the approach of building while-loop with repeat flag. Because it is not a consistent approach to have a job done without problems and see an error in log.

    Another approach would be to extend Abort step with an option like "Fail job" which can be checked by default. And when user unckeck it - reset number of errors to 0, set result to true and print INFO message to log instead of error. I can contribute some time to implement it.

    There is still a problem with a repeating jobs: when one of worker-steps fails the job continue repeating. I overcome that using Abort step. See the picture for details.

    Name:  while-loop-example.jpg
Views: 87
Size:  18.8 KB

    Vasili

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.