PDA

View Full Version : Comparing Confusion Matrices



textminer
05-05-2008, 11:13 AM
I am using different Weka algorithms for multiclass text classification. Is there a way to compare the confusion matrices generated by Weka in order to compare the performance of two classifiers on a data set?

For example: given an example with four possible classifications for a text, IBk gives a 4x4 confusion matrix, and SMO similarly gives a 4x4 confusion matrix. Is there a way to compare these two matrices in order to make some statement about the relative performance of the two classifiers?

Thanks in advance for any assistance.

Mark
05-05-2008, 11:41 PM
You could look at the values of the Kappa statistic for each method. The Kappa statistic is used to measure the agreement between predicted and observed categorizations of a data set, while correcting for agreement that occurs by chance.

You could also look at the information retrieval statistics and AUC values for each class value in turn, as long as your classifiers produce reasonable probability estimates (e.g. turn on the the option for fitting logistic regression functions to the output of of SMO; use a high enough value for k in KNN etc.).

Cheers,
Mark.

textminer
05-06-2008, 01:19 AM
Mark - thanks for your suggestions.

Question on interpreting the Kappa statistic. Classifier A completes a classification task with a Kappa statistic of 0.6444. Classifier B completes the same task with a Kappa statistic of 0.6721. Classifier B gets a higher score, but is Classifier B really a "better" classifier? How could I determine whether the difference between the Kappa values indicates a statistically significant difference in the performance of the classifiers?

Thanks -
David.

Mark
05-06-2008, 03:38 AM
For that you'd best use Weka's experiment environment (Experimenter). You can set up an experiment to apply your selected learners to a set of data sets using repeated runs of k-fold cross-validation (10 x 10-fold gives pretty stable estimates). The Experimenter allows you to compare the learners for significant differences in performance using t-tests - specifically, the corrected ressampled t-test of Nadeau and Bengio (2000) is used as it has reduced Type I error.

Cheers,
Mark.

textminer
05-06-2008, 08:07 PM
Mark -

Can you tell me how to run the t-test in the Experimenter? This is something I have been trying to find for some time. I do not see where I can request to have the t-test run to compare results of two classifiers.

Thanks again -
David.

textminer
05-06-2008, 08:23 PM
Another question: in the Explorer, I can select an algorithm (such as information gain, chi-square, etc) to filter attributes before running one of the learning algorithms (naive Bayes, SMO, etc). In the Experimenter, I can see how to select a learning algorithm, but don't see how to select a filter. Could you tell me how this is done?

Thanks -
David.

Mark
05-07-2008, 06:37 PM
Hi David,

For your first question - see the attached screenshot (I hope it's visible as the forums seem to limit pics to 600x400). Anyhow, once an experiment is finished in the Experimenter you go to the Analyse panel and click either the "Experiment" or "File" or "Database" button to load the results from wherever they were saved to. Most of the time, the "Experiment" button will do, as the Experimenter knows where the last experiment was saved to. After that, the only adjustment you might want to make is for the "Comparison field" - accuracy is the default, but you can choose from many measures (including Kappa). Mine shows an example for AUC. You can also choose which scheme is to go in the left-most column (i.e. the base scheme for which all the paired tests are done against). Use the "Test base" button for that. In the table of results, "*" and "v" indicates significantly lower/higher results according to the t-test. The interpretation of these as better or worse depends on the measure being used. E.g. "v" is better for when accuracy is being analyzed and worse when RMSE is the measure.

As for your second question - first a quick, but important, comment... In the Explorer, if you are applying supervised filters in the pre-process panel before running cross-validation (or other evaluation procedures) - DONT!! (if you want unbiased results that is :-)) Supervised filters use the target (class) attribute in some way in order to do their job (i.e. they learn something from the data, especially feature selection or supervised discretization). If you then go on to learn classifiers under cross-validation, the learners will have access to information gleaned from the test-folds and you will get overly-optimistic results. The correct approach is to use weka.classifiers.meta.FilteredClassifier. This meta-classifier allows the application of first a Filter to the training data and then the actual Classifier of interest. In this fashion, the filter in question only ever has access to the data in the training fold to learn from. At prediction time, each instance in the test fold passes through the filter before reaching the classifier. This also answers your question with regards to the Experimenter, as the same classifier (FilteredClassifier) can be used here as well. Note, it is possible to nest FilteredClassifiers in order to apply more than one Filter to the data (i.e. a FilteredClassifier that has another FilteredClassifier as the base learner and so on).

Whew! Hope this helps.

Cheers,
Mark.

textminer
05-07-2008, 07:39 PM
Mark -

Now I see how the t-test gets done in Weka; it gets run routinely at the end of the experiment. I was unclear about what the v's and *'s represented - but that's the information I was looking for, an assessment of the statistical significance of differences in the accuracy results for the classifiers. It was staring me in the face all the time!

Your explanation of the FilteredClassifier meta-classifier was extremely helpful. I was indeed using the supervised filters in the preprocessing panel and then doing cross-validation - now I see that this brings a risk of over-inflating the performance numbers. I found the meta-classifier and will use that instead for filtering prior to cross-validation.

Thanks so much for taking the time to answer all my questions! You've been a great help!!

David.

mcas
11-13-2012, 12:25 PM
I was aware of using Meta classifiers when using attribute selection, I didn't think that using supervised filters was also going to need it. I've did the try with the smote filter, as:
weka.classifiers.meta.FilteredClassifier -F "weka.filters.supervised.instance.SMOTE -C 0 -K 5 -P 800.0 -S 1" -W weka.classifiers.functions.MultilayerPerceptron -- -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a

In the summary, the total number of instances is equal to the the number of instances without SMOTE.
Is it, therefore, so good that has balanced the dataset to train, but has not take into account the newly generated data to calculate the performance summary, eliminating the possible overfitting in the performance?

Thanks!



Hi David,

As for your second question - first a quick, but important, comment... In the Explorer, if you are applying supervised filters in the pre-process panel before running cross-validation (or other evaluation procedures) - DONT!! (if you want unbiased results that is :-)) Supervised filters use the target (class) attribute in some way in order to do their job (i.e. they learn something from the data, especially feature selection or supervised discretization). If you then go on to learn classifiers under cross-validation, the learners will have access to information gleaned from the test-folds and you will get overly-optimistic results. The correct approach is to use weka.classifiers.meta.FilteredClassifier. This meta-classifier allows the application of first a Filter to the training data and then the actual Classifier of interest. In this fashion, the filter in question only ever has access to the data in the training fold to learn from. At prediction time, each instance in the test fold passes through the filter before reaching the classifier. This also answers your question with regards to the Experimenter, as the same classifier (FilteredClassifier) can be used here as well. Note, it is possible to nest FilteredClassifiers in order to apply more than one Filter to the data (i.e. a FilteredClassifier that has another FilteredClassifier as the base learner and so on).

Mark.