PDA

View Full Version : Logistic regression



boemans
11-04-2007, 09:08 AM
Hello all,

I have a dataset with 1 class (2 values: true and false).
Now, I would like to perform a logistic regression on the data.

No problem so far, but I don't manage to make 1 function from the output generated by Weka.
It just says:

Coefficients...
Variable Coeff.
1 -0.3986
2 0.1538
3 -0.8386
4 -0.1759
5 0.7428
...

Can someone explain me how to interpret this?
How can I know which variable is variable 1 and how to make a function of this output?

Mark
11-04-2007, 05:11 PM
Hi,

The coefficients are output in the same order that the attributes are declared in the data. The only time where things get tricky is when there are nominal attributes with more than two possible values. In this case, Logistic binarizes these attributes, so an attribute with three distinct values will be converted into three derived attributes (one for each of the possible values).

Here is the "more info" on Logistic, which you can access from the Explorer:

NAME
weka.classifiers.functions.Logistic

SYNOPSIS
Class for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):

If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.

The probability for class j with the exception of the last class is

Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1)

The last class has probability

1-(sum[j=1..(k-1)]Pj(Xi))
= 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1)

The (negative) multinomial log-likelihood is thus:

L = -sum[i=1..n]{
sum[j=1..(k-1)](Yij * ln(Pj(Xi)))
+(1 - (sum[j=1..(k-1)]Yij))
* ln(1 - sum[j=1..(k-1)]Pj(Xi))
} + ridge * (B^2)

In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

For more information see:

le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201.

Note: Missing values are replaced using a ReplaceMissingValuesFilter, and nominal attributes are transformed into numeric attributes using a NominalToBinaryFilter.



Cheers,
Mark.