PDA

View Full Version : Identify the misclassified files



hexstatic
11-28-2007, 06:50 AM
Hello

I'm conducting data mining on a sound database. I use the SMO classifier and cross-validation. The problem is that the matrix confusion I obtain does not permit me to identify specifically the misclassified files. Is there anyway to identify the files individually ?

Thank you very much !

-hexstatic-

Mark
11-28-2007, 05:53 PM
Yes there is. In the Weka Explorer's preprocess panel, add an ID attribute to your dataset by applying the AddID filter (weka.filters.unsupervised.attribute.AddID). By default, this inserts a numeric ID attribute as the first attribute in your data. In the Classifier panel, click the "More options" button and select the "Output predictions" check box; also, in the "Output additional attributes" text area enter "first" (without the quotes). Now you need to configure your classifier so that it does not use the new ID attribute to learn from. To do this you can use the FilteredClassifier (in the "meta") package. This classifier takes another classifier as a base learner and a filter (which it applies to the data before passing it to the classifier). Here is an example configuration using J48 that I copied and pasted from the Classifier panel in the Explorer:

weka.classifiers.meta.FilteredClassifier -F "weka.filters.unsupervised.attribute.Remove -R first" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2

Hope this helps.

Mark.

hexstatic
12-04-2007, 06:18 AM
Hi,

Thank you for your reply.
I did not achieve to fix the problem. First, I did not find the "Output additional attributes". Perhaps it is a version problem. I use the 3.5.6 one. Second, I don't know if it changes something, but I'd prefer to use the SMO classifier.
And, finally, I don't know if it is the same problem, but I also conduct cluterization, to obtain a tree representation. But, ones again, I obtain leafs and nodes, but I can't identify the specific files.
Thank you very much,

-hexstatic-

Mark
12-04-2007, 05:13 PM
My apologies - the "Output additional attributes" field was added as a feature since 3.5.6 and will be in the next release (soon). You can get a nightly snapshot of Weka that contains this feature from:

http://www.cs.waikato.ac.nz/~ml/weka/snapshots/

You can use SMO (or any classifier) instead of J48.

There is a similar facility (printing out predictions) for clustering, but it is only available (at present) from the command line. To get help on command line options for clustering try:

java weka.clusterers.<some clusterer> -h

This will give general help options plus options specific to <some clusterer>

There is also a FilteredClusterer that can be used in the same way as the FilteredClassifier can (for removing ID attributes for example).

Hope this helps.

Mark.

hexstatic
12-17-2007, 06:39 AM
I too apologize for being so slow to reply (I was quite overbooked last week !).
Well, I was not able to add the feature 'output additional attributes' (in fact I did not find how).
And also I don't know how to change the command line of clusterization.
I'm sorry but I'm a beginner with weka.
Thank you
-hexstatic-