US and Worldwide: +1 (866) 660-7555
Results 1 to 10 of 10

Thread: suggestion: how about adding a ID field in WEKA's Instance class?

Hybrid View

  1. #1
    Join Date
    Jun 2008
    Posts
    6

    Default suggestion: how about adding a ID field in WEKA's Instance class?

    Hi there,

    In WEKA experiment, it can be configured to output the Prediction/Target together with IDs. However, the IDs field can only be set to one of Attributes in the dataset. This is not useful because the ID field in the dataset is usually removed before using it to train a classifier. Otherwise, including a ID field in the training data would most likely affect the classifier's performance. I think it's better off to have a ID field in Instance class, like:

    protected String m_ID;

    In this way, we separate the ID from the rest of fields which are used in training / testing process. And the output from WEKA experiment (Prediction/Target/ID) make more sense. What's your opinion?

    Thanks for your effort to maintain this very nice forum.

    --Haiyong Xu

  2. #2
    Join Date
    Aug 2006
    Posts
    1,740

    Default

    Hi there,

    Have you taken a look at the section in the Weka wiki on how to use ID attributes?

    http://weka.sourceforge.net/wekadoc/...ng#Instance_ID

    Essentially, you can keep them in your data and use a FilteredClassifier to remove them only for the purposes of learning the model (they remain in the data for outputting along with predictions or for visualization).

    Cheers,
    Mark.

  3. #3
    Join Date
    Jun 2008
    Posts
    6

    Default the problem is in "ClassifierSplitEvaluator"

    Thanks Mark.

    The "FilteredClassifier" can solve the ID problem, but I don't think it work for my question.

    In "weka.experiment.ClassifierSplitEvaluator", the private member "m_attID" is used to indicate the output ID fields in the cross-validation experiment. In this case, I didn't figure out how to bypass its internal mechanism to generate result of Prediction/Target together with ID field. Would you like to give me a hint? Thanks.

    --Haiyong

  4. #4
    Join Date
    Aug 2006
    Posts
    1,740

    Default

    I'm not sure I understand what you are trying to do. The Experimenter always produces summary results (not predictions for individual test instances). The instances that are created by the Experimenter are the summary results computed for test folds, hold-out sets etc. When you select the options for outputting targets, predictions and an ID field, it creates String attributes in the resulting instances that contain a list of IDs or predictions/targets of each instance in the test fold/hold-out set, with each element separated by a "|" character.

    Why do you need to "bypass" this mechanism of generating IDs, targets and predictions etc?

  5. #5
    Join Date
    Jun 2008
    Posts
    6

    Default

    If you look at the source code of "ClassifierSplitEvaluator", you would find that the ID field has to be one of attributes in the dataset (an Instances object). If we remove the ID field before feeding the dataset to a classifier, then "ClassifierSplitEvaluator" cannot get the ID information for each instance in test-fold, which makes the Prediction meaningless.

  6. #6
    Join Date
    Aug 2006
    Posts
    1,740

    Default

    Quote Originally Posted by haiyeong View Post
    If we remove the ID field before feeding the dataset to a classifier, then "ClassifierSplitEvaluator" cannot get the ID information for each instance in test-fold, which makes the Prediction meaningless.
    Arghh! I feel like I'm starting to go mad :-) One last time - use the FilteredClassifier in your experiments. The FilteredClassifier was created so that a COPY of the original training data can be modified before being passed to the base classifier. If you use a FilteredClassifier with the filter set to be weka.filters.unsupervise.Remove (to remove the ID attribute), then the copy of the data passed to the classifier will have the ID removed. HOWEVER, the original data will still have the ID in it - this is the data from which the ClassifierSplitEvaluator will create the String attribute that contains the list of IDs for the current test fold.

    Does this handle your situtation, or am I still missing something?

    Cheers,
    Mark

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •