Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Disabling a branch in a transformation

  1. #1
    Join Date
    Aug 2016
    Posts
    281

    Default Disabling a branch in a transformation

    After experimenting with some "improved" stream/branch disable functionality, my big data transformation now suffers from congestion and comes to a halt! The transformation reads a single file and writes statistics to fact tables. There's more than 22'000'000 rows totally for this file. Which means everything has to run smooth and fast, or rows start to pile up.

    Some sub-streams should be disabled depending on arguments given at start. The straight forward way to do this in Spoon is to:

    1) Add "Get Variables" step. Add the variable which decides wheter the branch should be disabled or not.
    2) Add "Filter" step, filter on the stream field set above. True: continue stream. False: disable stream.

    However, this process means the same constant field is added 22'000'000 times, step 1) above. And the logic comparison is then done 22'000'000 times, step 2) above. That's 19'999'999 times more than necessary!

    So I tried do make my own java code to test only once:

    Code:
    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
    {
        if (first)
        {
            first = false;
            //Disable the stream / branch if ENABLE_BRANCH variable is not 'Y' (yes)
            String enable = getVariable("ENABLE_BRANCH", "NULL");
            if(!enable.equals("Y"))
            {
                setOutputDone();
                return false;
            }
        }
         Object[] r = getRow();
    
    
        if (r == null)
        {
            setOutputDone();
            return false;
        }
        
        r = createOutputRow(r, data.outputRowMeta.size());
        putRow(data.outputRowMeta, r);
        return true;
    }
    This works excellent with small data. The sub-branch to be disabled is immediately green/finished even before receiving the first row! But the transformation freezes with big data. Why? What am I missing? Seems like the steps up-stream is still trying to send rows somehow. Why would they try to send rows when this step has already executed "setOutputDone()" and returned false?

    I wish the filter step would accept variables!
    Last edited by Sparkles; 04-12-2019 at 12:41 PM.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,675

    Default

    The upstream steps will always send the data.

    What if you refactor a little bit?

    Code:
    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
    {
         if (first)
         {
              first = false;
              //Disable the stream / branch if ENABLE_BRANCH variable is not 'Y' (yes)
              String enable = getVariable("ENABLE_BRANCH", "NULL");
         }
         Object[] r = getRow();
    
    
         if (r == null)
         {
              setOutputDone();
              return false;
         }
    
         if (enable.equals("Y"))
         {
              putRow(data.outputRowMeta, r);
         }
         return true;
    }
    WARNING! Untested code above.
    I am not a coder. I do not know if above changes will work, let alone improve throughput

  3. #3
    Join Date
    Aug 2016
    Posts
    281

    Default

    I think you're right. That should most probably work without affecting performance. It still bothers me that a sub-stream can't be shut off immediately. Your solution looks like middleground when it comes to performance.

    The step prior to this splits into multiple sub-streams. That should of course continue sending data, but I think the problem is it keeps sending data to the sub-stream that is disabled, and this UDJC step that finished doesn't remove the new incoming rows.
    Last edited by Sparkles; 04-15-2019 at 06:34 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.