US and Worldwide: +1 (866) 660-7555
Results 1 to 4 of 4

Thread: How to analyze feature selection methods in Weka, Please help

  1. #1
    Join Date
    Aug 2012
    Posts
    4

    Default How to analyze feature selection methods in Weka, Please help

    Hi,

    I'm doing document classification. I'd like to perform analysis on feature selection methods using either the explorer or experiment environment. So to avoid overly-optimistic results I've manually divided my dataset to training and testing sets. I used meta classifier to apply StringToWordVector on both sets without any incompatibility issues. However, now I don't know how to apply the feature selection (Ranker + InfGain/chiSquare) on the training set ONLY.

    If anyone can tell me where or how to do multiple feature selection methods with different number of selected attribute to produce a graph that will be great.

    Another thing What is the difference between NaiveBayesMultinomial and NaiveBayesMultinomialUpdatable?

    thanks in advance...
    Last edited by nma; 08-20-2012 at 07:51 AM. Reason: more Q?

  2. #2
    Join Date
    Aug 2012
    Posts
    4

    Default

    I just tried the following:
    • NaiveBayesMultinomial FilteredClassifier with a MultiFilter of:
      • StringToWordVector
      • AttributeSelection (ChiSquaredAttributeEval + Ranker)


    I'd like to know if this is an overly optimistic results:

    Correctly Classified Instances 32 91.4286 %
    Incorrectly Classified Instances 3 8.5714 %
    Kappa statistic 0.8235
    Mean absolute error 0.0866
    Root mean squared error 0.266
    Relative absolute error 17.9613 %
    Root relative squared error 53.4543 %
    Total Number of Instances 35

    === Detailed Accuracy By Class ===

    TP Rate FP Rate Precision Recall F-Measure ROC Area class
    0.95 0.133 0.905 0.95 0.927 0.972 neg
    0.867 0.05 0.929 0.867 0.897 0.972 pos
    Weighted Avg. 0.914 0.098 0.915 0.914 0.914 0.972


    === Confusion Matrix ===

    a b <-- classified as
    19 1 | a = neg
    2 13 | b = pos


    The Scheme used are:
    weka.classifiers.meta.FilteredClassifier -F "weka.filters.MultiFilter -F \"weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000000 -prune-rate -1.0 -C -T -I -N 1 -L -S -stemmer weka.core.stemmers.LovinsStemmer -M 1 -tokenizer \\\"weka.core.tokenizers.WordTokenizer -delimiters \\\\\\\" \\\\\\\\r \\\\\\\\t.,;:\\\\\\\\\\\\\\\'\\\\\\\\\\\\\\\"()?!$*-&[]+/\\\\\\\\\\\\\\\\|\\\\\\\"\\\"\" -F \"weka.filters.supervised.attribute.AttributeSelection -E \\\"weka.attributeSelection.ChiSquaredAttributeEval \\\" -S \\\"weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N 100\\\"\"" -W weka.classifiers.bayes.NaiveBayesMultinomial

  3. #3
    Join Date
    Aug 2006
    Posts
    1,089

    Default

    Hi,

    Your filtered classifier setup is exactly the right way to do things in order to avoid cheating and overly optimistic results. How does the training error compare to the test error? It is usually difficult to overfit with naive Bayes models, so you should be OK. You could also try running this setup with cross-validation on all your data.

    The only difference between NaiveBayesMultinomial and NaiveBayesMultinomialUpdateable is that the latter implements UpdateableClassifier and exposes the updateClassifier() method (for incremental learning). In fact NaiveBayesMultinomialUpdateable is a subclass of NaiveBayesMultinomial.

    Cheers,
    Mark.

  4. #4
    Join Date
    Aug 2012
    Posts
    4

    Default

    Thank you mark.

    Is there a way I can plot/visualize the output of the WEKA experiment which is in either of the following formats:

    • CSV
    • GNUPlot
    • Plain text
    • HTML
    • LaTex
    • Significance only


    Knowing that I have installed GNUPlot on my windows and whenever I feed it with the experiment output, in gnufile format, it displays the following error "Invalid commands".

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •