View Full Version : Help exporting classifier outputs

05-27-2008, 06:11 PM
Hello. I would like to be able to use the outputs of my Logitboost and SMO classifiers in other programs. In order to do this I need not a WEKA specific model (I know how to output this) but rather an equation of the form something like:

w1*c1 + w2*c2 + w3*c3......etc = prediction for instance where prediction is 1 or 0.

I would ideally also like such an equation for my linear regressions.

When I select the output source code I do get something like this:

class WekaClassifier_1_37 {
public static double classify(Object[] i) {
/* c254 */
if (i[254] == null) { return -0.005138129199720509; } else if (((Double)i[254]).doubleValue() <= 0.0339055) { return -0.41656938874587; } else { return 0.1648500554181212; }

for a bunch of the columns - I assume this is something like what I want, but I don't know how to interpret it. If I went through and returned the values for each column that I found in these results what would I get out? Either a 1 or a 0?

Also, the output source code option is not available for SMO classifiers.

Is there any way to get what I need out of WEKA?


05-28-2008, 05:19 AM

Not all classifiers in Weka are Sourcable (like you said). In the case of LinearRegression, it could easily be Sourcable (something to add to the to-do list). With SMO, it would be more involved, but still possible. For the linear kernel it would be quite easy, and you would get an equation for the maximum margin hyperplane expressed in terms of attribute weights. For the non-linear cases, the model is expressed in terms of support vectors.

You asked about the output of LogitBoost and how to interpret it. By default, LogitBoost uses DecisionStump as the base learner (although it can use any classifier capable of predicting numeric targets). DecisionStump is basically a single-level decision tree (i.e. a test on a single attribute), hence it is expressed in terms of an if-then rule. LogitBoost learns a weighted ensemble of stumps. The output that you show in your posting is a Java class that represents just one stump in the sequence that is learned. The generated source code for LogitBoost includes a global classify() method that assembles a final prediction by calling the classify() methods of all the learned stumps. Essentially, a probability distribution across all the possible target values is generated and the index of the maximum probability is returned by the global classify() method. So, in your case, assuming a two class problem, the final result will be either 0 or 1 for a given test instance.

Hope this helps.