Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: correlation coefficient

  1. #1
    Join Date
    Jan 2009
    Posts
    21

    Default correlation coefficient

    Hello All,

    In WEKA, the correlation coefficient between 1 and 0, but according to wikipedia http://en.wikipedia.org/wiki/Correlation_coefficient, it's between -1 and 1, while the coefficient of determination, R2 http://en.wikipedia.org/wiki/R_squared is between 1 and 0.

    Can anyone confirm for me please, which one is right which is wrong? I know I can compare the formula, but I am not very confident at maths

    Thank you very much

    Wen

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Wen,

    Weka does report the correlation coefficient. Normally you do get a value that lies from approximately 0 to 1, but this is because it takes a "confused" classifier to get a significantly negative correlation coefficient :-) I.e. the model would have to systematically predict opposite to the true target in order to achieve a negative correlation coefficient. In the case of a nominal binary target, better results could be achieved by flipping the prediction!

    Note, you can just take the square of the correlation coefficient to get the coefficient of determination.

    Cheers,
    Mark.

  3. #3
    Join Date
    Jan 2009
    Posts
    21

    Default

    Hi Mark:

    thank you for the reply, but I am still confused about what is a 'confused' classifier.

    I made an example as follows. and I don't understand why the coefficient correlation is 1 while, Value2 = -2*Value1 strictly, which I made up in excel for the test.

    Thank you very much

    value1value21-254-10821-4214-280.21-0.423-64-841-820.04-0.0845-90327-6542-423-46

    === Run information ===
    Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
    Relation: negCoefficient
    Instances: 13
    Attributes: 2
    value1
    value2
    Test mode: 10-fold cross-validation
    === Classifier model (full training set) ===

    Linear Regression Model
    value2 =
    -2 * value1 +
    0
    Time taken to build model: 0.08 seconds
    === Cross-validation ===
    === Summary ===
    Correlation coefficient 1
    Mean absolute error 0
    Root mean squared error 0
    Relative absolute error 0 %
    Root relative squared error 0 %
    Total Number of Instances 13

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Wen,

    Can you perhaps attach your entire arff file that produced this result? You've given me 7 of the 13 instances and linear regression produces a totally different model on just these 7.

    Cheers,
    Mark.

  5. #5
    Join Date
    Jan 2009
    Posts
    21

    Default

    Hi Mark:

    Here you go, it's an excel file.

    cheers,

    Wen
    Attached Files Attached Files

  6. #6
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Wen,

    Oh, since the Wiki removed all the spaces between the numbers in your first message, I got the first seven instances completely wrong anyway :-)

    So the target really is -2 * value1. The correlation is computed between the *predicted* and actual target values. In the case of the linear model learned, these are identical (it is a perfect model for the data :-)), so the correlation coefficient is 1.0.

    Cheers,
    Mark.

  7. #7
    Join Date
    Jan 2009
    Posts
    21

    Default

    Hi Mark:

    OIC! I thought it was the correlation coefficient between value1 and value2. So really, it is almost impossible to get a negative correlation coefficient.?

    Thank you a lot

    Wen

  8. #8
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Quote Originally Posted by riverculture View Post
    Hi Mark:

    OIC! I thought it was the correlation coefficient between value1 and value2. So really, it is almost impossible to get a negative correlation coefficient.?

    Wen
    Yes, basically. A classifier that can't fit the concept at all will probably get essentially zero correlation (it might be slightly negative due to chance effects). What I meant earlier by the "confused" classifier would basically be a model that went out of it's way to predict in the opposite direction from the actual class. This would give you a significantly negative correlation coefficient.

    Cheers,
    Mark.

  9. #9
    Join Date
    Jan 2009
    Posts
    21

    Default

    thank you Mark, I appreciate your reply.

    Wen

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.