PDA

View Full Version : Low ROC and Correct Classification Ratio --- Overfitting ?



ttoulliu2002
07-24-2007, 12:17 AM
Hi:

I am using weka 3.5.6 for a classification problem.
It is a 38 samples with 7129 attributes. I used 10 fold
cross validation to create model for the classification.
The ROC is about 0.9+ But as I used another independent
test data set 34 samples with 7129 attributes. I got
very low ROC value about 0.5. I have tried to use all
algorithms for the classification. However, all of them
showed the same issue. Low ROC for test data set.
It seems overfitting but failed at test data set. I am
pasting one of my results below. How to resolve this
issue

Scheme: weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1 -W 1 -K " weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
Relation: cancer
Instances: 38
Attributes: 7130
[list of attributes omitted]
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

SMO

Kernel used:
Linear Kernel: K(x,y) = <x,y>

Classifier for classes: A, B

BinarySMO

Machine linear: showing attribute weights, not support vectors.

0.0054 * (normalized) X1
+ 0.0017 * (normalized) X2

- 0.4598

Number of kernel evaluations: 647 (94.086% cached)


=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 36 94.7368 %
Incorrectly Classified Instances 2 5.2632 %
Kappa statistic 0.8648
K&B Relative Info Score 3271.0797 %
K&B Information Score 28.7188 bits 0.7558 bits/instance
Class complexity | order 0 33.192 bits 0.8735 bits/instance
Class complexity | scheme 2148 bits 56.5263 bits/instance
Complexity improvement (Sf) -2114.808 bits -55.6528 bits/instance
Mean absolute error 0.0526
Root mean squared error 0.2294
Relative absolute error 12.6005 %
Root relative squared error 50.395 %
Total Number of Instances 38

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0.182 0.931 1 0.964 0.909 A
0.818 0 1 0.818 0.9 0.909 B

=== Confusion Matrix ===

a b <-- classified as
27 0 | a = A
2 9 | b = B


=== Re-evaluation on test set ===

User supplied test set
Relation: cancer
Instances: unknown (yet). Reading incrementally
Attributes: 7130

=== Summary ===

Correctly Classified Instances 19 55.8824 %
Incorrectly Classified Instances 15 44.1176 %
Kappa statistic 0
Mean absolute error 0.4412
Root mean squared error 0.6642
Total Number of Instances 34

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 1 0.559 1 0.717 0.5 A
0 0 0 0 0 0.5 B

=== Confusion Matrix ===

a b <-- classified as
19 0 | a = A
15 0 | b = B

Thanks