Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: What does J48 split on?

  1. #1

    Default What does J48 split on?

    As far as I understand it, the J48 tree uses information gain to decide on which attributes to branch. However, for some datasets, the InfoGainAttributeEval function gives another attribute than the root node as having the highest information gain. Shouldn't the attribute with the highest information gain always be the first split in the tree (the root)?

    For example, here is the unpruned J48 tree for the Iris dataset:

    J48 unpruned tree
    ------------------

    petalwidth <= 0.6: Iris-setosa (50.0)
    petalwidth > 0.6
    | petalwidth <= 1.7
    | | petallength <= 4.9: Iris-versicolor (48.0/1.0)
    | | petallength > 4.9
    | | | petalwidth <= 1.5: Iris-virginica (3.0)
    | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
    | petalwidth > 1.7: Iris-virginica (46.0/1.0)

    And here is the ranking of information gain:

    === Attribute Selection on all input data ===

    Search Method:
    Attribute ranking.

    Attribute Evaluator (supervised, Class (nominal): 5 class):
    Information Gain Ranking Filter

    Ranked attributes:
    1.418 3 petallength
    1.378 4 petalwidth
    0.698 1 sepallength
    0.376 2 sepalwidth

    So the question is, why isn’t the “petallength” the root of the tree?

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    J48 is an implementation of C 4.5 release 8. C4.5 uses the gain ratio splitting criterion. Take a look at:

    http://en.wikipedia.org/wiki/Information_gain_ratio

    Cheers,
    Mark.

  3. #3

    Default

    Thanks! In Weka, does the GainRatioAttributeEval function then calculate the same thing?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.