Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Kettle-Weka Scoring > logistic regression models using dummy coding fails !

  1. #1
    Join Date
    May 2015
    Posts
    3

    Default Kettle-Weka Scoring > logistic regression models using dummy coding fails !

    Hi All,

    I have generated a logistic model which uses a combination of categorical/nominal (predictors) as well as numeric predictors. WEKA logistic model performs dummy coding for the nominal predictors having multiple categories. The model use 15 variables. The dummy coding on a couple of nominal predictors makes it 20 (includes K-1 values for nominal attributes) in the model. I have attached the model details in the snapshots. When I deploy the model in Kettle's weka scoring plugin I get the following error on attribute number miss match. I am able to score data using the model with in WEKA tool and it works perfectly fine. The problem is when I embed it in Kettle WEKA Scoring plugin. Looking for pointers on how to circumvent this issue.

    Environment: Pentaho Kettle community 5.0.1-stable version. Models are generated in WEKA 3.7 stable.

    2015/05/12 14:10:57 - Weka Scoring.0 - ERROR (version 5.0.1-stable, build 1 from 2013-11-15_16-08-58 by buildguy) : Unexpected error
    2015/05/12 14:10:57 - Weka Scoring.0 - ERROR (version 5.0.1-stable, build 1 from 2013-11-15_16-08-58 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2015/05/12 14:10:57 - Weka Scoring.0 - Unable to make prediction for row # 1
    2015/05/12 14:10:57 - Weka Scoring.0 - Src and Dest differ in # of attributes: 15 != 20
    2015/05/12 14:10:57 - Weka Scoring.0 -
    2015/05/12 14:10:57 - Weka Scoring.0 - at org.pentaho.di.scoring.WekaScoring.processRow(WekaScoring.java:411)
    2015/05/12 14:10:57 - Weka Scoring.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60)
    2015/05/12 14:10:57 - Weka Scoring.0 - at java.lang.Thread.run(Thread.java:724)
    2015/05/12 14:10:57 - Weka Scoring.0 - Caused by: java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 15 != 20
    2015/05/12 14:10:57 - Weka Scoring.0 - at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:87)
    2015/05/12 14:10:57 - Weka Scoring.0 - at weka.filters.Filter.copyValues(Filter.java:371)
    2015/05/12 14:10:57 - Weka Scoring.0 - at weka.filters.Filter.push(Filter.java:288)
    2015/05/12 14:10:57 - Weka Scoring.0 - at weka.filters.unsupervised.attribute.NominalToBinary.convertInstance(NominalToBinary.java:502)
    2015/05/12 14:10:57 - Weka Scoring.0 - at weka.filters.unsupervised.attribute.NominalToBinary.input(NominalToBinary.java:176)
    2015/05/12 14:10:57 - Weka Scoring.0 - at weka.classifiers.functions.Logistic.distributionForInstance(Logistic.java:793)
    2015/05/12 14:10:57 - Weka Scoring.0 - at org.pentaho.di.scoring.WekaScoringClassifier.distributionForInstance(WekaScoringClassifier.java:111)
    2015/05/12 14:10:57 - Weka Scoring.0 - at org.pentaho.di.scoring.WekaScoringData.generatePrediction(WekaScoringData.java:503)
    2015/05/12 14:10:57 - Weka Scoring.0 - at org.pentaho.di.scoring.WekaScoring.processRow(WekaScoring.java:406)

    Thanks
    Sandeep Jayaprakash


    Model fields mapping in Weka scoring plugin
    ----------------------------------------------------------

    Model attributes Incoming fields
    ------------------------- ------- ----------------
    (nominal) ONLINE_FLAG --> 4 (string) ONLINE_FLAG
    (numeric) ENROLLMENT --> 5 (numeric) ENROLLMENT
    (numeric) SAT_VERBAL --> 8 (numeric) SAT_VERBAL
    (numeric) SAT_MATH --> 9 (numeric) SAT_MATH
    (numeric) APTITUDE_SCORE --> 10 (numeric) APTITUDE_SCORE
    (numeric) AGE --> 11 (numeric) AGE
    (nominal) RC_GENDER --> 12 (string) RC_GENDER
    (nominal) RC_ENROLLMENT_STATUS --> 13 (string) RC_ENROLLMENT_STATUS
    (nominal) RC_CLASS_CODE --> 14 (string) RC_CLASS_CODE
    (numeric) GPA_CUMULATIVE --> 15 (numeric) GPA_CUMULATIVE
    (nominal) STANDING --> 17 (string) STANDING
    (numeric) RMN_SCORE_PARTIAL --> 19 (numeric) RMN_SCORE_PARTIAL
    (numeric) R_CONTENT_READ --> 20 (numeric) R_CONTENT_READ
    (numeric) R_SESSIONS --> 28 (numeric) R_SESSIONS
    (nominal) ACADEMIC_RISK --> 29 (string) ACADEMIC_RISK



    Logistic model
    ---------------------
    Logistic Regression with ridge parameter of 1.0E-8
    Coefficients...
    Class
    Variable 1
    ===============================
    ONLINE_FLAG 1.0753
    ENROLLMENT 0.0199
    SAT_VERBAL -0.0007
    SAT_MATH 0.0009
    APTITUDE_SCORE 0.0006
    AGE -0.0009
    RC_GENDER 0.0597
    RC_ENROLLMENT_STATUS -0.7419
    RC_CLASS_CODE=1 -0.3276
    RC_CLASS_CODE=2 0.2561
    RC_CLASS_CODE=3 0.0494
    RC_CLASS_CODE=4 0.005
    GPA_CUMULATIVE -2.3407
    STANDING=0 1.587
    STANDING=1 0.21
    STANDING=2 -2.4776
    RMN_SCORE_PARTIAL -0.0742
    R_CONTENT_READ -0.1562
    R_SESSIONS -0.0725
    Intercept 12.2622




    Odds Ratios...
    Class
    Variable 1
    ===============================
    ONLINE_FLAG 2.931
    ENROLLMENT 1.0201
    SAT_VERBAL 0.9993
    SAT_MATH 1.0009
    APTITUDE_SCORE 1.0006
    AGE 0.9991





    Attachment 14853Attachment 14854Attachment 14855
    Last edited by 5andeep; 05-13-2015 at 01:59 PM.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hmm. There shouldn't be any issue at this level in the data transformation (at the NominalToBinary filter within Weka). Logistic works fine in WekaScoring regardless of whether the data contains multivalued nominal attributes or not. Can you verify that the version of Weka that you're creating the models in matches that which is bundled in PDI? If you open a terminal, cd to your PDI installation directory and type:

    java -jar plugins/steps/weka-scoring/lib/pdm-3.7-<whatever the suffix of this file is>.jar

    What is the version number reported in the GUIChooser? If it doesn't match your stand-alone version of Weka then this is most likely to be the cause of the problem.

    Cheers,
    Mark.

  3. #3
    Join Date
    May 2015
    Posts
    3

    Default

    Hi Mark,

    Thanks for a quick response. I am able to run J48 and Naive Bayes based models which are generated by the same weka version except for logistic regression/SMO with dummy coding.

    The weka version in data-integration\plugins\steps\weka-scoring\lib is pdm-3.7-ce-TRUNK-SNAPSHOT
    Weka version in GUI is 3.7.5
    Pentaho Kettle community 5.0.1-stable version

    I tried dropping the weka.jar (3.7.5) into the data-integration\plugins\steps\weka-scoring\lib folder. The issue still persists though. One other thing I tried was using Weka 3.7.12 (stable version). The models once again could not be executed with Wekascoring plugin in kettle. It resulted in AttributeLocator class missing exception. (Hence, I regressed back to 3.7.5 version)

    --
    Sandeep Jayaprakash
    Last edited by 5andeep; 05-15-2015 at 12:08 PM.

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    If you are using the community edition of PDI 5.0.1-stable then you will have downloaded the weka-scoring plugin separately from:

    https://sourceforge.net/projects/pen.../5.0.1-stable/

    This version of the Weka scoring plugin contains Weka 3.7.10, so any models you create in Weka 3.7.5 are unlikely to be 100% compatible. There have been a lot of bugs fixed since 3.7.5. I'd suggest that you:

    1. Verify that your Weka scoring plugin does indeed contain Weka 3.7.10. If it doesn't, then delete it and install the one from the sourceforge link above.
    2. Recreate all your models in Weka 3.7.10.

    Cheers,
    Mark.

  5. #5
    Join Date
    May 2015
    Posts
    3

    Default

    I switched my weka version to 3.7.10, rebuilt my models and also used the 3.7 weka jar in Kettle 5.0.1 stable. I am able to run the models now .
    Highly appreciate your help and quick responses. Thanks a lot Mark.

    --
    Sandeep Jayaprakash
    Last edited by 5andeep; 05-15-2015 at 12:07 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.