PDA

View Full Version : Getting AddClassification to add a classification



greg.soulsby
05-17-2009, 04:25 PM
I have a couple of models I am happy with, and saved them as a .model. Now I have a dataset I want to make predictions on.

I have loaded the dataset in a KnowledgFlow, added an AddClassification and pointed it at that model, setting OutputDistribution to true. Then I have the dataset link going out to a TextView. When I run the text viewer shows me the attibutes in an Arff form, including a field for the OutputDistribution, but the data section is empty.

What could be wrong? How can I debug?

Greg

Mark
05-17-2009, 06:15 PM
Hi Greg,

You need to make sure that you have a ClassAssigner and a TrainingSetMaker component (in that order) following your ArffLoader. The connection has to be trainingSet in this case (even though you are using a saved model) due to the nature of the filter architecture.

Cheers,
Mark.

greg.soulsby
05-17-2009, 07:19 PM
Ah, so there are dataSets and trainingSets, that makes sense.

But I have not got it to work, so before hounding you with more questions, can you point me to documentation? Or working examples? I have seen the Java documentation but I have no hope of understanding that in terms usage like these situations.

Also, is there a way of getting something out by way of debug? I seea debug output option in lots of places but have not come across where to access it.

Many thanks

Greg

Mark
05-17-2009, 07:51 PM
There is a chapter on the Knowledge Flow starting on page 89 of the WekaManual.pdf that comes with the Weka distribution (3.6.0). This gives an overview of the capabilities of the Knowledge Flow and presents some examples.

I've attached a flow file (xml) that demonstrates how to use AddClassification. I've had to change the extension on this file from .kfml to .ktr because the Wiki hasn't been configured to accept .kfml as a legal attachment type yet (just rename to example.kfml once you've downloaded it). You'll need to configure the ArffLoader and AddClassification component for your source data and model file respectively. Don't forget to change either "outputClassification" or "outputDistribution" from false to true in AddClassification.

$HOME/weka.log contains all output (stdout, stderr and stuff that gets written to graphical log panels) for a Weka session.

Cheers,
Mark.

greg.soulsby
05-19-2009, 12:20 PM
Mark, thats extremely helpful. I would like to leave you something in my will but I started out with nothing and I still have most of it left, so there is not much point.

To push the friendship too far, would you have an example of an Experiment we could kick start with?

Greg

PS: A suit of examples files / case studies would be very helpful to us newbees. We would be happy to contribute if you had a suggested way of putting them on the site

Mark
05-21-2009, 10:46 PM
Hi Greg,

I can put together an example xml Experiment configuration file for you - what sort of experiment do you want to run? Basically, experiments are typically a set of learning algorithms applied to a set of data sets. You can run either classification experiments (discrete target) or regression experiments (numeric target). In advanced mode, it is also possible to run experiments using clustering algorithms.

I think a set of example files/case studies from the new user's standpoint is an excellent idea. I've been so close to Weka for such a long time now that it is often not easy for me to see things from the newbies standpoint :-) I can make a space in the data mining section of Pentaho's Wiki if you have stuff to contribute.

Cheers,
Mark.