PDA

View Full Version : Guidance needed for starting out WEKA/datamining



riverculture
01-08-2009, 01:41 AM
Hi:

I have played around WEKA and started reading the book Data Mining by Ian H. Witten and Eibe Frank.

My goal is to build some useful and simple predictive models for my marketing analysis work as soon as possible(ie. become a good user of WEKA or be good at 'applied' data mining ). I have IT programming background with limited statistics knowledge. Although the book is very well structured and easy to read, I found the maths/statistics part a bit challenging. It took me huge amount of time to understand compared to other parts of the book.

I am wondering if I can skip the maths/stats part. Is it ok that as long as I understand the input and output, ie. knowing what learning method to use for a particular dataset to achieve a particular goal and knowing what the outputs mean and how to read the evaluation of the outputs, then I can become a good WEKA users and apply data mining efficiently for my work? (I am not trying to become a data mining researcher or expert at this stage)

Metaphorically speaking, I don't need to know the mechanism of the car to become a good driver. Is it true for WEKA?

Thank you very much!!

Wen

Mark
01-08-2009, 03:50 AM
Hi Wen,

To a certain extent you can "drive the car without knowing how the engine works in detail" :-) Especially, as you said, if you are comfortable with inputs and interpretation of outputs.

I is helpful to have at least a rudimentary idea of how the algorithms work and their strengths and weaknesses. This can help you make decisions on which methods to apply for data set x with characteristics y.

Cheers,
Mark.

riverculture
01-08-2009, 08:49 PM
Thank you mark, you are so helpful.

I think now I need to focus on something frequently used for marketing analysis, as I found from reading the book, there are a lot of algorithms, some are specially for medical research and some are for agriculture etc.

Could you or anyone give me a list of the names of the classifiers, associators, clusterers and/or attribute evaluators that are frequently used for marking analysis? So that I focus on these ones first.

I know it might be a long answer, but please reply to me even though you only know one or two of them.

Thank you very much again

Wen

Mark
01-12-2009, 04:22 PM
Hi Wen,

There is no fast and hard answer to this question. Algorithms that are commonly applied in quite a few application domains include logistic regression, decision trees, rule learners such as RIPPER (JRip in Weka) and Apriori (for market basket analysis). If you get reasonable results with one or more of these types of algorithm then it is common to experiment with improving upon the results by using ensemble learning (usually boosting or bagging type algorithms).

Cheers,
Mark.

riverculture
01-13-2009, 12:06 AM
Great, thanks alot, I have actually read through some of those algorithms briefly, so your answer gave me a confirmation and a broad picture of how I deal with them.