View Full Version : Advice needed on evaluation

02-11-2009, 01:00 AM

Could I please have some advice on the evalution of a model?

I have read through the formula about the following error measurements, and trying to understand the interpretation.
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error

Is there a rule of thumb that what number is indicating the model is useful? is there a boundary or something like that? say for an arbitrary instance, something like-- if Mean Absolute Error <0.5 is good, otherwise bad? I don't really know how to interpret this 2 errors,
Mean absolute error
Root mean squared error

I know that the errors with the word 'relative' are comparing difference between the actual and predicted value and the difference between the actual value and the mean of the actual value. So if they are >=1, that means the model's predicting ability is not better than the mean's. Am I right? If my understanding is correct then, if <1, that means the model precdicts better than the mean, but how much better is good enough? Can I use the model if the relative errors are 0.8? how about 0.7 etc....

Another question if I could, the errors measurement makes sense to me when the class is numeric, but why use these error measurements for the nominal classes? (ie, what is the difference between yes and no??) the following is an example from the weather data bundled with WEKA. Or did I apply it to the wrong learning scheme?

I know it's a long question, Thank you so so much!


=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: weather
Instances: 14
Attributes: 5
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
outlook = sunny
| humidity <= 75: yes (2.0)
| humidity > 75: no (3.0)
outlook = overcast: yes (4.0)
outlook = rainy
| windy = TRUE: no (2.0)
| windy = FALSE: yes (3.0)
Number of Leaves : 5
Size of the tree : 8

Time taken to build model: 0.05 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 9 64.2857 %
Incorrectly Classified Instances 5 35.7143 %
Kappa statistic 0.186
Mean absolute error 0.2857
Root mean squared error 0.4818
Relative absolute error 60 %
Root relative squared error 97.6586 %
Total Number of Instances 14
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.778 0.6 0.7 0.778 0.737 yes
0.4 0.222 0.5 0.4 0.444 no
=== Confusion Matrix ===
a b <-- classified as
7 2 | a = yes
3 2 | b = no

02-11-2009, 01:52 AM
Hi Wen,

Most of the time it really depends on the application as to "how good" is good enough, so unfortunately, there is no universal threshold applicable to all domains. You are correct about the relative error measures - they are useful in determining when your model is doing better than just predicting the mean (which is not directly obvious from the rmse, mae etc.).

The rmse, mae etc. can be computed for, and do make sense for, nominal classes. In this case these measures are computed from the probability distributions output by the classifier and the vector of probabilities that represent the actual class. E.g. say you have an instance from a domain with a binary (yes, no) target with a class of yes. The vector representing the true probability for this instance is [1,0]. Say your classifier outputs a probability distribution of [0.8, 0.2], then you would add (1.0 - 0.8)^2 + (0 - 0.2)^2 to the sum for computing the mean squared error. Note, this is discussed in the Witten and Frank book (I don't have a copy on hand at the moment to give you a page number).

Hope this helps.


02-14-2009, 02:27 AM
Hi Mark:

Thank you so much for the explanation! I am very clear about that now.

If I could, I have a question regarding evaluation on success rate in the following excerpt. My question is whether we need to calculate the range of the success rate given a confidence interval, or WEKA can do this. If WEKA does the calculation, where I can get this?(ie. which button, tab... I need to click on)

thank you very much again.


****************excerpt from Chapter 5 of
Data Mining by Ian and Eibe***************
The answer to this question is usually expressed as a confidence interval; that
is, p lies within a certain specified interval with a certain specified confidence.
For example, if S = 750 successes are observed out of N = 1000 trials, this indi-
cates that the true success rate must be around 75%. But how close to 75%? It
turns out that with 80% confidence, the true success rate p lies between 73.2%
and 76.7%. If S = 75 successes are observed out of N = 100 trials, this also indi-
cates that the true success rate must be around 75%. But the experiment is
smaller, and the 80% confidence interval for p is wider, stretching from 69.1%
to 80.1%.

02-14-2009, 05:42 PM
Hi Wen,

No, Weka does not have an option to compute this confidence interval (you would have to make the computation manually). Repeated cross-validation gives you a good idea of how much variance there is in the accuracy of a classifier. The Experimenter can perform repeated cross-validation for comparing multiple learning schemes. Furthermore, it implements the paired corrected t-test for detecting significant differences in accuracy between learning schemes.


02-15-2009, 06:05 PM
Great, Thank you Mark. Haven't touched the experimenter yet. But will check it out soon. I appreciate your quick reply a lot.