View Full Version : Help in Clustering

06-05-2009, 10:47 AM
Hi everyone,

How can we use clustering in developing a predicitve model. I have implemented a dataset using k-means algorithm. i gave number of clusters as '2'. I got cluster1 and cluster2 . But i am getting the "sum of squared errors :1037.0" is there any effect of this in the output. is my clustering is wrong or its having a poor performance?

Cluster 1 is red colour and cluster2 is blue colour in the "visualize errors clusters". In that i am getting in some group as a combination of cluster1 and 2. so whats the interpretation from it.

how will you interpret the clusterring algorithm with respect to aim of developing the predictive model ?

please help- me out. its getting interested to work in WEKA........

Please i want some quick replies.... HELP!!

Thanking you

06-09-2009, 07:27 PM
Hi Naveen,

A clustering model can be used to make predictions on new data. The prediction it makes for a new instance is the cluster that the instance most likely belongs to.

Evaluating and comparing the results of clustering is more difficult than it is for classification. With K-means the sum of squared errors tells you how "tight" the clustering results are with respect to your data. I.e. the more close the instances in a cluster are to its centroid, the lower the sum of squared errors will be. This allows you to compare runs of K-means using different numbers of requested clusters.

The output of clustering (i.e. the description of the clusters) can be hard to interpret. For descriptive data mining, some people learn a decision tree or set of rules from the results of clustering. That is, they use the cluster assignments as the class label and then build a classification model. In the case of trees and rules, this can give you an idea of which attributes are most important in determining the clustering.

Hope this helps.