PDA

View Full Version : .xml transposition tools for .xrff format



douglegge
10-03-2008, 04:34 AM
I’m looking to import data into Weka to gain insight on classification and prediction. The original data comes from software that can either export in a format which is easily exported to .csv and then manually transposed to .arff :), but which contains only summary results, and is therefore of little use due to limited @data content :(.

The second output format is .xml :) but where .xrff has the following format:
dataset name="iris" version="3.5.3">
<header>
<attributes>
<attribute name="sepallength" type="numeric"/>
<attribute name="sepalwidth" type="numeric"/>
<attribute name="petallength" type="numeric"/>

My original data has different attribute names:confused::
<?xml version="1.0"?>
<pdml version="0" creator="xxx/1.0.3">
<stuff>
<context name="geninfo" pos="0" showname="General information" size="62">
<field name="num" pos="0" show="2" "/>
<field name="len" pos="0" show="62" />
….

Whilst I could do a .txt based find/replace does anyone know of. xml tool’s that already exist that are specifically for transposing data without introducing errors or breaking the original “well-formed”
document. I’m not from an .xml background, and have just started reading the Nutshell book, but want to avoid wasting time, re-inventing the wheel for something that has probably already been completed.

Thanks

Doug

Mark
10-03-2008, 05:24 AM
Hi Doug,

You could take a look at Pentaho PDI (Kettle). It has several steps for reading data from XML files:

http://wiki.pentaho.com/display/EAI/XML+Input
http://wiki.pentaho.com/display/EAI/Get+Data+From+XML

You could then write out to CSV using the TextFileOutput step:

http://wiki.pentaho.com/display/EAI/Text+File+Output

or directly to ARFF using the ARFFOutput plugin for Kettle:

http://wiki.pentaho.com/display/DATAMINING/Using+the+ARFF+Output+Plugin

Download for ARFFOutput:

http://wiki.pentaho.com/display/DATAMINING/Home

Cheers,
Mark.

douglegge
10-03-2008, 05:39 AM
Mark,

Thanks for that. I'll have a look now.

Is this an area you are directly involved in?

Doug

Mark
10-03-2008, 04:11 PM
Mark,

Thanks for that. I'll have a look now.

Is this an area you are directly involved in?

Doug

Hi Doug,

I'm Pentaho's "data mining guy" :-) I'm also one of the three original core developers of Weka (and currently the maintainer). I wrote the plugins for Kettle as well. However, I'm not a an expert on advanced aspects of Kettle (and ETL for that matter).

Cheers,
Mark.