PDA

View Full Version : the threshold curve



Sabrina
12-17-2008, 04:21 AM
I want to know if it is possible to obtain in weka a table where each value of threshold of a ROC curve is associate with each instance. I will try to explain better.
After generating the decision tree I visualized the threshold curve and saved the result. The obtained result was a new arff file where there were the Instance_number , true positives, true negatives, recall,the threshold, etc. The total number of instances of this file was lower than the total number of instances of the original dataset. What’s happened? Is there another method that permit me to associate with each instance the corresponding value of threshold?

Regards
Sabrina

Mark
12-17-2008, 05:41 AM
Hi Sabrina,

Decision trees tend to produce quite discrete probability estimates. So what you end up with is a lot of ties when ranking instances by the probability assigned to the positive class. This is why there are a lot fewer instances in the threshold curve data than there are in your original data set. If you use a scheme that produces smoother probability estimates (e.g. logistic regression, bagged decision trees, naive Bayes etc.) you will find a lot more instances in their threshold curves.

Cheers,
Mark.

Sabrina
12-17-2008, 10:15 AM
Hi Mark,
Thanks a lot.
I have a problem. I have some customers and I want to decide on the basis of the results of the threshold curve which customers I can contact. In particular I want to contact the customers that have the highest value of threshold.Therefore, for each customer_id I want to know the corresponding value of threshold. I want to have a result similar to what is found when we add for each customer_id the corresponding cluster after clustering.

Regards,
Sabrina

Mark
12-17-2008, 11:05 PM
Hi Sabrina,

There isn't a filter in Weka to do what you want. One possiblity would be to output the predictions from your classifier using either the -p option on the command line or by selecting "Output predictions" from the "More options" in the Classifier panel of the Explorer. You would then need to sort the output according to the probability assigned to the positive class.

Cheers,
Mark.