PDA

View Full Version : about normalization



berko3000
12-05-2008, 05:53 PM
hello all,

I have a training dataset of 8 attributes and 1 class(binary: 0-1). When I try Naive Bayes classifier on this set, I'm getting an accuracy of around %76. I want to increase its performance but dont know how to do it. I heard that normalization might improve the performance. So , which attributes should I select for correct normalization? (based on what criterion?) If anyone tell me these details I'll be glad.

Thanks

Mark
12-05-2008, 07:53 PM
Hi,

If your attributes are numeric, then discretizing them almost always results in better performance for naive Bayes (this has been shown in the literature). You can use either unsupervised (e.g. equal width or equal frequency) or supervised (Fayyad and Irani's MDL-based method) discretization - Weka has both.

Cheers,
Mark.

berko3000
12-06-2008, 03:41 AM
thanks for the info.
I want to remove redundant attributes manually. Is there a way to do it?(except using Weka's preprocessing filters).I mean by looking each attributes distributions bar graphs and trying to figure out which one is needed. I see that some of them has a very low mean around 0.42 compared to 60-120 range. How can we use this information to select them? Any other ideas?

Regards

Mark
12-07-2008, 10:26 PM
This fliter might be what you're looking for:

http://wiki.pentaho.com/display/DATAMINING/SubsetByExpression

Cheers,
Mark.