Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Disabling a branch in a transformation

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Aug 2016

    Default Disabling a branch in a transformation

    After experimenting with some "improved" stream/branch disable functionality, my big data transformation now suffers from congestion and comes to a halt! The transformation reads a single file and writes statistics to fact tables. There's more than 22'000'000 rows totally for this file. Which means everything has to run smooth and fast, or rows start to pile up.

    Some sub-streams should be disabled depending on arguments given at start. The straight forward way to do this in Spoon is to:

    1) Add "Get Variables" step. Add the variable which decides wheter the branch should be disabled or not.
    2) Add "Filter" step, filter on the stream field set above. True: continue stream. False: disable stream.

    However, this process means the same constant field is added 22'000'000 times, step 1) above. And the logic comparison is then done 22'000'000 times, step 2) above. That's 19'999'999 times more than necessary!

    So I tried do make my own java code to test only once:

    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
        if (first)
            first = false;
            //Disable the stream / branch if ENABLE_BRANCH variable is not 'Y' (yes)
            String enable = getVariable("ENABLE_BRANCH", "NULL");
                return false;
         Object[] r = getRow();
        if (r == null)
            return false;
        r = createOutputRow(r, data.outputRowMeta.size());
        putRow(data.outputRowMeta, r);
        return true;
    This works excellent with small data. The sub-branch to be disabled is immediately green/finished even before receiving the first row! But the transformation freezes with big data. Why? What am I missing? Seems like the steps up-stream is still trying to send rows somehow. Why would they try to send rows when this step has already executed "setOutputDone()" and returned false?

    I wish the filter step would accept variables!
    Last edited by Sparkles; 04-12-2019 at 12:41 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.