PDA

View Full Version : correlation coefficient



riverculture
02-11-2009, 01:51 AM
Hello All,

In WEKA, the correlation coefficient between 1 and 0, but according to wikipedia http://en.wikipedia.org/wiki/Correlation_coefficient, it's between -1 and 1, while the coefficient of determination, R2 http://en.wikipedia.org/wiki/R_squared is between 1 and 0.

Can anyone confirm for me please, which one is right which is wrong? I know I can compare the formula, but I am not very confident at maths :D

Thank you very much

Wen

Mark
02-11-2009, 02:10 AM
Hi Wen,

Weka does report the correlation coefficient. Normally you do get a value that lies from approximately 0 to 1, but this is because it takes a "confused" classifier to get a significantly negative correlation coefficient :-) I.e. the model would have to systematically predict opposite to the true target in order to achieve a negative correlation coefficient. In the case of a nominal binary target, better results could be achieved by flipping the prediction!

Note, you can just take the square of the correlation coefficient to get the coefficient of determination.

Cheers,
Mark.

riverculture
02-14-2009, 02:04 AM
Hi Mark:

thank you for the reply, but I am still confused about what is a 'confused' classifier.

I made an example as follows. and I don't understand why the coefficient correlation is 1 while, Value2 = -2*Value1 strictly, which I made up in excel for the test.

Thank you very much

value1value21-254-10821-4214-280.21-0.423-64-841-820.04-0.0845-90327-6542-423-46

=== Run information ===
Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation: negCoefficient
Instances: 13
Attributes: 2
value1
value2
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===

Linear Regression Model
value2 =
-2 * value1 +
0
Time taken to build model: 0.08 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 13

Mark
02-14-2009, 05:36 PM
Hi Wen,

Can you perhaps attach your entire arff file that produced this result? You've given me 7 of the 13 instances and linear regression produces a totally different model on just these 7.

Cheers,
Mark.

riverculture
02-15-2009, 06:20 PM
Hi Mark:

Here you go, it's an excel file.

cheers,

Wen

Mark
02-15-2009, 11:08 PM
Hi Wen,

Oh, since the Wiki removed all the spaces between the numbers in your first message, I got the first seven instances completely wrong anyway :-)

So the target really is -2 * value1. The correlation is computed between the *predicted* and actual target values. In the case of the linear model learned, these are identical (it is a perfect model for the data :-)), so the correlation coefficient is 1.0.

Cheers,
Mark.

riverculture
02-16-2009, 02:37 AM
Hi Mark:

OIC! I thought it was the correlation coefficient between value1 and value2. So really, it is almost impossible to get a negative correlation coefficient.?

Thank you a lot

Wen

Mark
02-16-2009, 03:36 AM
Hi Mark:

OIC! I thought it was the correlation coefficient between value1 and value2. So really, it is almost impossible to get a negative correlation coefficient.?

Wen

Yes, basically. A classifier that can't fit the concept at all will probably get essentially zero correlation (it might be slightly negative due to chance effects). What I meant earlier by the "confused" classifier would basically be a model that went out of it's way to predict in the opposite direction from the actual class. This would give you a significantly negative correlation coefficient.

Cheers,
Mark.

riverculture
02-16-2009, 06:53 PM
thank you Mark, I appreciate your reply.

Wen