View Full Version : Finding Similarities in Data using Weka

03-03-2008, 04:17 PM
I am looking for commonalities in a numeric dataset and would like to find association rules. I have gotten the book Data Mining: Practical Machine Learning Tools and Techniques, which was suggested in another post on this forum for an introduction to Weka, but it seems that association rules are primarily used for nonnumeric datasets, and my data is almost exclusively numeric. I've also noticed that a number of examples in the first chapter of the book predict an outcome based on certain attributes, but my dataset only consists of elements of one group, that is, the outcome would be the same for all items in my group. How should I best proceed to analyse this data? I can't imagine that no one has used association rules on numeric data in the past, what topics should I read more about and can I do this with Weka?

Many thanks!

03-04-2008, 10:28 PM

One simple approach you can take re association rules is to discretize your data using unsupervised discretization. Try using weka.filters.unsupervised.attribute.Discretize from either the Explorer or the command line. This filter can perform equal-width and equal-frequency discretization of numeric attributes.

Discretization is discussed on pages 296-305 of the Witten and Frank book (2nd ed.).