Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Different results from Weka Explorer GUI and Java API

  1. #1
    Join Date
    Apr 2017
    Posts
    2

    Default Different results from Weka Explorer GUI and Java API

    I used Weka Explorer to generate a model from my data using the Multilayer Perceptron. I am trying to essentially use the generated model thresholds and weights to implement distributionForInstance in C++ to perform predictions. I got things working, but the probability distributions returned weren't matching what Weka Explorer said when I turn Output predictions on. I thought I had probably implemented things incorrectly, but then I thought I'd try loading the saved model in Java to try to figure out why the results were different. To my surprise, the results matched what I was producing, not what the Weka GUI was outputting. I checked the options on the classifier and they match too. Any ideas what I'm doing wrong or not understanding?

    I've attached the model file I saved (zipped up) and here's the Java code I was using to validate things.

    Code:
    MultilayerPerceptron multilayerPerceptron = (MultilayerPerceptron) SerializationHelper.read(new FileInputStream("weka.model"));
    double[] instanceValues = new double[] { 1, 1.374, 1.4831, 1.3179, 1, 26789280, 7724, 0.3621, 1094.5664, 865.861, 111.9893, 24.8018, 2623, 0.4326, 1116.8243, 988.3485, 111.2097, 25.2443, 5101, 0.3198, 1083.121, 795.5681, 112.3901, 24.5712};
    Instance instance = new DenseInstance(1.0, instanceValues);
    double[] probabilityDistribution = multilayerPerceptron.distributionForInstance(instance);
    for (double probability : probabilityDistribution) {
      System.out.println(probability);
    }
    This code outputs the following:

    0.007144567238714766

    0.08709647430355107
    0.8006037306066485
    1.608899063174446E-4
    0.1049943379447683

    Looking at the same instance in Weka Explorer gives the following:

    0.05
    0
    0.926
    0.023
    0.001

    This one at least has the same winning class, but other instances don't and all of them don't match. Any ideas what is going on?


    Ben
    Attached Files Attached Files

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    What evaluation mode are you using in the Explorer? Predictions and evaluation results for percentage split and x-val involve using the models trained on the training folds. However, the model output in the text area (and exported when you save the classifier) is the model built on *all* the data.

    Are you sure that the DenseInstance you are constructing has the same structure as the training data? Also, your code does not seem to to assign your instance to a dataset (Instances object), so it does not have access to header information.

    I'd suggest making a one instance ARFF file (containing your example instance), load that into the Explorer as a separate test set in the Classify panel, and then load your serialized model into the Classify panel and use the "Re-apply model on current test set" option to see whether the prediction matches your code.

    Cheers,
    Mark.

  3. #3
    Join Date
    Apr 2017
    Posts
    2

    Default

    Thank you for the response. I tried what you suggested to test the single instance and it returns the same probability distribution as the code I have, so I guess you resolved my original problem. However, now I have more questions about how Weka is actually doing the training and testing.

    Quote Originally Posted by Mark View Post
    What evaluation mode are you using in the Explorer?
    I was using 10 fold cross validation, but also tried a percentage split and thought it was odd that the same model was being output. I guess I now know that the model was being built on *all* the data for both methods.

    Quote Originally Posted by Mark View Post
    Predictions and evaluation results for percentage split and x-val involve using the models trained on the training folds. However, the model output in the text area (and exported when you save the classifier) is the model built on *all* the data.
    Am I correct in understanding that the probability distributions displayed in Weka's UI are using the model from just the training data? In the case of cross validation, this means there would be multiple models being used for each fold. But then if I want to apply the model being output elsewhere, this model is really based on the training and test data, right? Doesn't this mean that Weka is using test data for training and would be prone to overfitting to it? Is the correct method to avoid this to manually separate my data into a training set and use the "Use training set" option to build the model and then use the separate test set with the "Supplied test set" option on the Classify tab?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.