huajie.lee
12-06-2007, 12:05 AM
Hi,
I am quite new to Weka. And I stumble upon the concept of Confusion Matrix for a couple of days. I would like to know the exact meaning of False positive and False negative in the context of Confusion matrix.
Let say given this sample of weather.arff from Weka:-
train.arff
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,95,TRUE,yes
test.arff
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Results from weka:-
J48 pruned tree
------------------
temperature <= 75: yes (8.0/2.0)
temperature > 75: no (3.0/1.0)
Number of Leaves : 2
Size of the tree : 3
Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds
=== Error on training data ===
Correctly Classified Instances 8 72.7273 %
Incorrectly Classified Instances 3 27.2727 %
Kappa statistic 0.3774
Mean absolute error 0.3939
Root mean squared error 0.4438
Relative absolute error 84.0796 %
Root relative squared error 92.1724 %
Total Number of Instances 11
=== Confusion Matrix ===
a b <-- classified as
6 1 | a = yes
2 2 | b = no
=== Error on test data ===
Correctly Classified Instances 1 33.3333 %
Incorrectly Classified Instances 2 66.6667 %
Kappa statistic -0.5
Mean absolute error 0.5556
Root mean squared error 0.5971
Relative absolute error 120.3704 %
Root relative squared error 125.9128 %
Total Number of Instances 3
=== Confusion Matrix ===
a b <-- classified as
1 1 | a = yes
1 0 | b = no
What I dont really got confused is :-
=== Confusion Matrix ===
a b <-- classified as
6 1 | a = yes
2 2 | b = no
especially on the false positive and false negative part.
I am confused that in this case, the class
temperature <= 75: yes
temperature > 75: no
are the Actual or Predict ? And the data set are Actual or Predict?
I dont get it especially the interpretation of false positive yields 2 and false negative which yields 1.
For example in the data:-
overcast,83,86,FALSE,yes
This data is not fulfil the rule because temperature<=75 concludes to YES. But it is 83.
Which is Predicted data? Does it refer to temperature<=75?
Correct me if Im wrong. Can anyone explain more detail to me?
Thanks. Appreciate alot.
-Jason
I am quite new to Weka. And I stumble upon the concept of Confusion Matrix for a couple of days. I would like to know the exact meaning of False positive and False negative in the context of Confusion matrix.
Let say given this sample of weather.arff from Weka:-
train.arff
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,95,TRUE,yes
test.arff
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Results from weka:-
J48 pruned tree
------------------
temperature <= 75: yes (8.0/2.0)
temperature > 75: no (3.0/1.0)
Number of Leaves : 2
Size of the tree : 3
Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds
=== Error on training data ===
Correctly Classified Instances 8 72.7273 %
Incorrectly Classified Instances 3 27.2727 %
Kappa statistic 0.3774
Mean absolute error 0.3939
Root mean squared error 0.4438
Relative absolute error 84.0796 %
Root relative squared error 92.1724 %
Total Number of Instances 11
=== Confusion Matrix ===
a b <-- classified as
6 1 | a = yes
2 2 | b = no
=== Error on test data ===
Correctly Classified Instances 1 33.3333 %
Incorrectly Classified Instances 2 66.6667 %
Kappa statistic -0.5
Mean absolute error 0.5556
Root mean squared error 0.5971
Relative absolute error 120.3704 %
Root relative squared error 125.9128 %
Total Number of Instances 3
=== Confusion Matrix ===
a b <-- classified as
1 1 | a = yes
1 0 | b = no
What I dont really got confused is :-
=== Confusion Matrix ===
a b <-- classified as
6 1 | a = yes
2 2 | b = no
especially on the false positive and false negative part.
I am confused that in this case, the class
temperature <= 75: yes
temperature > 75: no
are the Actual or Predict ? And the data set are Actual or Predict?
I dont get it especially the interpretation of false positive yields 2 and false negative which yields 1.
For example in the data:-
overcast,83,86,FALSE,yes
This data is not fulfil the rule because temperature<=75 concludes to YES. But it is 83.
Which is Predicted data? Does it refer to temperature<=75?
Correct me if Im wrong. Can anyone explain more detail to me?
Thanks. Appreciate alot.
-Jason