Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: .xml transposition tools for .xrff format

  1. #1
    Join Date
    Oct 2008
    Posts
    2

    Default .xml transposition tools for .xrff format

    I’m looking to import data into Weka to gain insight on classification and prediction. The original data comes from software that can either export in a format which is easily exported to .csv and then manually transposed to .arff , but which contains only summary results, and is therefore of little use due to limited @data content .

    The second output format is .xml but where .xrff has the following format:
    dataset name="iris" version="3.5.3">
    <header>
    <attributes>
    <attribute name="sepallength" type="numeric"/>
    <attribute name="sepalwidth" type="numeric"/>
    <attribute name="petallength" type="numeric"/>

    My original data has different attribute names:
    <?xml version="1.0"?>
    <pdml version="0" creator="xxx/1.0.3">
    <stuff>
    <context name="geninfo" pos="0" showname="General information" size="62">
    <field name="num" pos="0" show="2" "/>
    <field name="len" pos="0" show="62" />
    ….

    Whilst I could do a .txt based find/replace does anyone know of. xml tool’s that already exist that are specifically for transposing data without introducing errors or breaking the original “well-formed”
    document. I’m not from an .xml background, and have just started reading the Nutshell book, but want to avoid wasting time, re-inventing the wheel for something that has probably already been completed.

    Thanks

    Doug

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Doug,

    You could take a look at Pentaho PDI (Kettle). It has several steps for reading data from XML files:

    http://wiki.pentaho.com/display/EAI/XML+Input
    http://wiki.pentaho.com/display/EAI/Get+Data+From+XML

    You could then write out to CSV using the TextFileOutput step:

    http://wiki.pentaho.com/display/EAI/Text+File+Output

    or directly to ARFF using the ARFFOutput plugin for Kettle:

    http://wiki.pentaho.com/display/DATA...+Output+Plugin

    Download for ARFFOutput:

    http://wiki.pentaho.com/display/DATAMINING/Home

    Cheers,
    Mark.

  3. #3
    Join Date
    Oct 2008
    Posts
    2

    Default

    Mark,

    Thanks for that. I'll have a look now.

    Is this an area you are directly involved in?

    Doug

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Quote Originally Posted by douglegge View Post
    Mark,

    Thanks for that. I'll have a look now.

    Is this an area you are directly involved in?

    Doug
    Hi Doug,

    I'm Pentaho's "data mining guy" :-) I'm also one of the three original core developers of Weka (and currently the maintainer). I wrote the plugins for Kettle as well. However, I'm not a an expert on advanced aspects of Kettle (and ETL for that matter).

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.