Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: WEKA Experimenter: cross-validation results inconsistent

  1. #1
    Join Date
    Jan 2017
    Posts
    1

    Default WEKA Experimenter: cross-validation results inconsistent

    I've performed various feature selection techniques to obtain several feature subset candidates. I am now using the WEKA Experimenter to perform 10 x 10-fold CV to rank the feature subsets (using the same classifier).

    I have noticed that MAE and RAE do not always agree, i.e., the "best" feature subset according to MAE is X whereas the "best" feature subset according to RAE is Y. I would expect MAE and RAE to always agree since RAE is just the MAE normalized by the "variance" of the training data. Does WEKA not use consistent train/test splits when comparing two datasets?

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    The relative metrics are normalized by the same base metric computed from using the mean/mode of the class values in the training data as the predicted value. I'm not sure exactly what you are doing, but if the datasets being compared basically differ only in the features that they contain (i.e. are the same dataset, with the same instances in the same starting order etc.), then results computed will be comparable, as the splits created will be the same for both.

    I would normally suggest using the FilteredClassifier to wrap the feature selection process so that test data is not seen by the feature selection and error estimates are not overly optimistic; however, if your goal is simply ranking then performing global feature selection before cross-validation should be OK.

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.