Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Guidance needed for starting out WEKA/datamining

  1. #1
    Join Date
    Jan 2009
    Posts
    21

    Default Guidance needed for starting out WEKA/datamining

    Hi:

    I have played around WEKA and started reading the book Data Mining by Ian H. Witten and Eibe Frank.

    My goal is to build some useful and simple predictive models for my marketing analysis work as soon as possible(ie. become a good user of WEKA or be good at 'applied' data mining ). I have IT programming background with limited statistics knowledge. Although the book is very well structured and easy to read, I found the maths/statistics part a bit challenging. It took me huge amount of time to understand compared to other parts of the book.

    I am wondering if I can skip the maths/stats part. Is it ok that as long as I understand the input and output, ie. knowing what learning method to use for a particular dataset to achieve a particular goal and knowing what the outputs mean and how to read the evaluation of the outputs, then I can become a good WEKA users and apply data mining efficiently for my work? (I am not trying to become a data mining researcher or expert at this stage)

    Metaphorically speaking, I don't need to know the mechanism of the car to become a good driver. Is it true for WEKA?

    Thank you very much!!

    Wen

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Wen,

    To a certain extent you can "drive the car without knowing how the engine works in detail" :-) Especially, as you said, if you are comfortable with inputs and interpretation of outputs.

    I is helpful to have at least a rudimentary idea of how the algorithms work and their strengths and weaknesses. This can help you make decisions on which methods to apply for data set x with characteristics y.

    Cheers,
    Mark.

  3. #3
    Join Date
    Jan 2009
    Posts
    21

    Default Predictive Modelling for Marketing

    Thank you mark, you are so helpful.

    I think now I need to focus on something frequently used for marketing analysis, as I found from reading the book, there are a lot of algorithms, some are specially for medical research and some are for agriculture etc.

    Could you or anyone give me a list of the names of the classifiers, associators, clusterers and/or attribute evaluators that are frequently used for marking analysis? So that I focus on these ones first.

    I know it might be a long answer, but please reply to me even though you only know one or two of them.

    Thank you very much again

    Wen

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Wen,

    There is no fast and hard answer to this question. Algorithms that are commonly applied in quite a few application domains include logistic regression, decision trees, rule learners such as RIPPER (JRip in Weka) and Apriori (for market basket analysis). If you get reasonable results with one or more of these types of algorithm then it is common to experiment with improving upon the results by using ensemble learning (usually boosting or bagging type algorithms).

    Cheers,
    Mark.

  5. #5
    Join Date
    Jan 2009
    Posts
    21

    Default

    Great, thanks alot, I have actually read through some of those algorithms briefly, so your answer gave me a confirmation and a broad picture of how I deal with them.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.