View Full Version : Convert Weka decision trees into XML files

11-17-2007, 12:29 AM

Just a short message to announce that I have just released Wekatext2Xml (http://www.lucsorel.com/index.php?page=downloads#wekatext2xml), a light-weight Java application which converts decision trees generated by Weka classifiers into editable and parsable XML files. For the moment, Wekatext2Xml (http://www.lucsorel.com/index.php?page=downloads#wekatext2xml) only works on J48 decision trees (implementation of Ross Quinlan C4.5 algorithm) which have a syntax like this:

outlook = sunny
| humidity <= 70: yes (2.0)
| humidity > 70: no (3.0)
outlook = overcast: yes (4.0)
outlook = rainy
| windy = TRUE: no (2.0)
| windy = FALSE: yes (3.0)Documentation, samples, and screenshots are available on Wekatext2Xml (http://www.lucsorel.com/index.php?page=downloads#wekatext2xml) webpage. Do not hesitate to use and comment.

Cordially, Luc

04-11-2010, 05:47 AM

I just wanted to let the Weka users community that I recently upgraded a small Java application, WekaTextToXML, which converts decision trees produced by Weka (J48 algorithm) into XML files. The upgraded application and the online documentation are available at http://www.lucsorel.com/index.php?page=downloads#wekatext2xml. The upgrades are:

Wekatext2Xml can export the decision trees as mindmaps browsable and editable with the opensource program called Freemind; which I found convenient to explore the decision trees dynamically and provide a convenient way to export drawings for presentations and papers
Wekatext2Xml can also be used from the Java command line to industrially export your decision trees to XML and mindmap files (examples of commands and arguments are given online)

I hope you will find these upgrades useful. If you need some explanations, feel free to contact me http://www.lucsorel.com/index.php?page=contact

Best regards,
Luc Sorel, PhD

04-13-2010, 03:20 AM
Hi Luc,

Sounds cool, especially the Freemind export stuff.

Just as some general information to readers, I'll point out that the textual tree that is printed (and an XML representation thereof) is not the full story in terms of prediction for C4.5 (and J48) when there are missing values in test instances. When there are missing values for the tests at one or more nodes (during training as well as prediction), C4.5 uses a strategy of splitting the instance into fractional parts proportional to the number of instances that have been seen in each subtree below the node for the test in question. Predictions are collected from all the subtrees and a weighted average is taken.

Support for exporting PMML (predictive modeling markup language), an XML-based standard for data mining models, is on the roadmap for Weka. PMML's tree model has support for various strategies for handling missing values (including the C4.5 approach).