Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Accuracy issue when logically splitting dataset

  1. #1
    Join Date
    Jan 2016

    Default Accuracy issue when logically splitting dataset

    Hello Friends,
    I am running into an issue where I am finding it hard to explain a phenomena. Any help is much appreciated

    Basically, I he a dataset with 25 attributes which is logically separated into three categories (A,B,C). When I run the test with Random Forest wit all attributes combined, I get accuracy of 93% (93/100 is correctly predicted). But when I run the test with different logical features separately and combine the results, it ends up correctly predicting more than 93 instances. To be more clear, I am attaching a picture.

    As shown in the picture, combination accuracy(93)is lower than hen splitting. My best guess I that I happens because of the correlation between the features. But I am myself not satisfied with that rationale.

    Attached Images Attached Images  

  2. #2
    Join Date
    Aug 2006


    Try increasing the number of trees learned from the default of 100. Try 500 or 1000. You can also fiddle with the -K option - this controls the number of attributes randomly considered for splitting at a node in a given tree.


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.