Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: How to handle skewed data?

  1. #1

    Default How to handle skewed data?

    I have binary classification problem. I have total 9000 records. 1200 records are with class 'Yes' and rest with class 'No'. I want to do Feature Selection. Do I need to use SMOTE for balancing? Should i use SMOTE before or after Feature Selection? Someone, pl elaborate how to handle this type of data.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    There are a couple of options that you could use for this. There is the FilteredSubsetEval and FilteredAttributeEval methods allow the application of an instance filter (such as resampling, SMOTE etc) to be wrapped into the attribute selection process. Or there is the CostSensitiveSubsetEval and CostSensitiveAttributeEval methods - these allow a cost matrix to be specified and thus reweight/resample the training data according to the costs. Both these approaches are in Weka 3.6, and are available for Weka 3.7 via the built-in package management system.

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.