PDA

View Full Version : Query on interpreting confusion matrix



huajie.lee
12-06-2007, 01:05 AM
Hi,

I am quite new to Weka. And I stumble upon the concept of Confusion Matrix for a couple of days. I would like to know the exact meaning of False positive and False negative in the context of Confusion matrix.

Let say given this sample of weather.arff from Weka:-

train.arff
@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,95,TRUE,yes

test.arff

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

Results from weka:-

J48 pruned tree
------------------

temperature <= 75: yes (8.0/2.0)
temperature > 75: no (3.0/1.0)

Number of Leaves : 2

Size of the tree : 3


Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances 8 72.7273 %
Incorrectly Classified Instances 3 27.2727 %
Kappa statistic 0.3774
Mean absolute error 0.3939
Root mean squared error 0.4438
Relative absolute error 84.0796 %
Root relative squared error 92.1724 %
Total Number of Instances 11


=== Confusion Matrix ===

a b <-- classified as
6 1 | a = yes
2 2 | b = no


=== Error on test data ===

Correctly Classified Instances 1 33.3333 %
Incorrectly Classified Instances 2 66.6667 %
Kappa statistic -0.5
Mean absolute error 0.5556
Root mean squared error 0.5971
Relative absolute error 120.3704 %
Root relative squared error 125.9128 %
Total Number of Instances 3


=== Confusion Matrix ===

a b <-- classified as
1 1 | a = yes
1 0 | b = no


What I dont really got confused is :-

=== Confusion Matrix ===

a b <-- classified as
6 1 | a = yes
2 2 | b = no

especially on the false positive and false negative part.

I am confused that in this case, the class


temperature <= 75: yes
temperature > 75: no

are the Actual or Predict ? And the data set are Actual or Predict?
I dont get it especially the interpretation of false positive yields 2 and false negative which yields 1.

For example in the data:-

overcast,83,86,FALSE,yes

This data is not fulfil the rule because temperature<=75 concludes to YES. But it is 83.
Which is Predicted data? Does it refer to temperature<=75?

Correct me if Im wrong. Can anyone explain more detail to me?

Thanks. Appreciate alot.


-Jason

Mark
12-06-2007, 06:16 PM
Hi there,

I think the majority of your question has already been answered on the Weka mailing list. With regards to your question about the rules - the labels in the data set are of course the actual class values; the labels in the consequent of the rules are the predicted class labels given the anticendent.

Cheers,
Mark.