US and Worldwide: +1 (866) 660-7555
Results 1 to 9 of 9

Thread: filters for complex and constantly changing rules

  1. #1
    Join Date
    Mar 2012
    Posts
    28

    Default filters for complex and constantly changing rules

    Hi,

    We need a step that can filter rows with complex and constantly changing rules. The step Filter Rows is good for simple rules and Java Filter is good for complex rules. But Java Filter appears to be not a best fit for complex and constantly changing rules because rules can not be organized in a concise manner. Is there any other filter step that can organize complex rules in a concise manner? Thanks.


    Ey-Chih Chow

  2. #2
    Join Date
    Nov 2008
    Posts
    511

    Default

    Can you give an example of a constantly changing complex rule organized in a concise manner? That might help spur on some ideas from other users.

    Also, in the Java Filter step I see that the filter condition (Java expression) can be specified in a Variable which means it can be passed in from a calling job. That might help tame a constantly changing rule.
    pdi-ce-4.4.0-stable
    MySQL 5.5
    Windows 7 (32 bit)

  3. #3
    Join Date
    Mar 2012
    Posts
    28

    Default

    Quote Originally Posted by darrell.nelson View Post
    Can you give an example of a constantly changing complex rule organized in a concise manner? That might help spur on some ideas from other users.

    Also, in the Java Filter step I see that the filter condition (Java expression) can be specified in a Variable which means it can be passed in from a calling job. That might help tame a constantly changing rule.

    ====================================


    For example, we use a Java Filter with the following condition:

    ---------------
    app.equalsIgnoreCase("MT2") && !(cat.equalsIgnoreCase("Session") || cat.equalsIgnoreCase("StoreTx") || (cat.equalsIgnoreCase("GameTx") && !act.equalsIgnoreCase("BuyShift"))|| cat.equalsIgnoreCase("PushNotification") || cat.equalsIgnoreCase("LocalNotifiaction") || cat.equalsIgnoreCase("LocationRequest") || cat.equalsIgnoreCase("GameCenter") || cat.equalsIgnoreCase("Check-in") || (cat.equalsIgnoreCase("Progression") && act != null && act.equalsIgnoreCase("LevelUp")) || cat.equalsIgnoreCase("BusinessMgmt") || cat.equalsIgnoreCase("HotBiz") || cat.equalsIgnoreCase("Location") || cat.equalsIgnoreCase("NetPromoter") || cat.equalsIgnoreCase("Flurry") || cat.equalsIgnoreCase("SlotMachine") ||(cat.equalsIgnoreCase("Facebook") && act != null && act.equalsIgnoreCase("UserInfo")) || cat.equalsIgnoreCase("Application") || (Integer.parseInt(uid) % 997) < 200)
    ------------

    The rule needs to be updated as the application evolves. Since the rule is not modularized very well, when we update if, we need to figure out how to change the rule very carefully.

    You mention that the filter condition can be passed from the calling job. Could you elaborate? Thanks.

    Ey-Chih Chow

  4. #4
    Join Date
    Jun 2012
    Posts
    1,443

    Default

    Let's assume you can't control complexity and frequency of change at all.
    One thing you are responsible for, though, is to make those expressions behave.
    Simply rewriting them can increase maintainability:

    Code:
       false
    || cat.equalsIgnoreCase("StoreTx") 
    || cat.equalsIgnoreCase("PushNotification") 
    || cat.equalsIgnoreCase("LocalNotifiaction") 
    || cat.equalsIgnoreCase("LocationRequest") 
    || cat.equalsIgnoreCase("GameCenter") 
    || cat.equalsIgnoreCase("Check-in") 
    || cat.equalsIgnoreCase("BusinessMgmt") 
    || cat.equalsIgnoreCase("HotBiz") 
    || cat.equalsIgnoreCase("Location") 
    || cat.equalsIgnoreCase("NetPromoter") 
    || cat.equalsIgnoreCase("Flurry") 
    || cat.equalsIgnoreCase("SlotMachine") 
    || cat.equalsIgnoreCase("Application") 
    || cat.equalsIgnoreCase("Facebook") && act != null && act.equalsIgnoreCase("UserInfo")
    || cat.equalsIgnoreCase("Progression") && act != null && act.equalsIgnoreCase("LevelUp")
    || cat.equalsIgnoreCase("GameTx") && !act.equalsIgnoreCase("BuyShift")
    || app.equalsIgnoreCase("MT2") && !(cat.equalsIgnoreCase("Session") 
    || (Integer.parseInt(uid) % 997) < 200)
    This filtering expression already is in disjunctive form and consists mostly of simple terms, that could also be used for a JOIN.
    The conjunctive terms at the end could easily be handled by that UDJE step Darrell already mentioned - after further normalization, that is.
    As a side-effect you get unhindered sight of typos ("LocalNotifiaction") and consistency errors (missing null checks).
    Eventually a simplified Filter step would tailor your result set as needed.
    pdi-ce-4.3.0-stable
    OpenJDK IcedTea 2.3.7 (7u21)
    ubuntu 12.04 LTS (x86_64)

  5. #5
    Join Date
    Nov 2008
    Posts
    511

    Default

    Quote Originally Posted by eychih View Post
    You mention that the filter condition can be passed from the calling job. Could you elaborate?
    Variables can be set in either a job or a transformation and then used later in another job or transformation.
    Variables are described here: http://wiki.pentaho.com/display/EAI/.07+Variables

    In the Java Filter Step, notice the little gray diamond at the right side of the Condition field. That tells you that you can insert a previously defined variable that holds the filter code. However, I cannot get it to work. I will dig into it deeper and fill out a Jira if necessary. Sorry.
    pdi-ce-4.4.0-stable
    MySQL 5.5
    Windows 7 (32 bit)

  6. #6
    Join Date
    Jun 2012
    Posts
    1,443

    Default

    Quote Originally Posted by darrell.nelson View Post
    However, I cannot get it to work.
    Just a wild guess: You forgot to quote the variable?

    Code:
    IN_FIELD.equals("${VAR}")
    pdi-ce-4.3.0-stable
    OpenJDK IcedTea 2.3.7 (7u21)
    ubuntu 12.04 LTS (x86_64)

  7. #7
    Join Date
    Nov 2008
    Posts
    511

    Default

    Quote Originally Posted by marabu View Post
    Just a wild guess: You forgot to quote the variable?

    Code:
    IN_FIELD.equals("${VAR}")
    I had assumed that selecting the variable (${FILTER} in my case) would be detected and taken care of by the step. Apparently not but how did you come up with your suggestion (which doesn't work, by the way)? What is IN_FIELD and why are you using the .equals() method? I can't find any documentation for this step...
    pdi-ce-4.4.0-stable
    MySQL 5.5
    Windows 7 (32 bit)

  8. #8
    Join Date
    Nov 2008
    Posts
    511

    Default

    Okay, maybe I see what's going on now. I was hoping that the step would get the entire filter expression from the variable instead of having to type it into the box. That's apparently not how it works.
    pdi-ce-4.4.0-stable
    MySQL 5.5
    Windows 7 (32 bit)

  9. #9
    Join Date
    Jun 2012
    Posts
    1,443

    Default

    Oh, I took the freedom to look at the samples and the pdi-source is always but one click away...

    We definitely must enter a Java expression. And with the variable being substituted by its value, there must be quotes. I made sure it works before I posted my suggestion, so my guess wasn't that wild at all.

    IN_FIELD was my choice for an input field name and VAR was my generic variable name. So we have a perfectly well formed Java expression evaluating to a boolean value.
    pdi-ce-4.3.0-stable
    OpenJDK IcedTea 2.3.7 (7u21)
    ubuntu 12.04 LTS (x86_64)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •