Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Parsing XML files with mutiple "documents"

  1. #1
    Join Date
    Jun 2008
    Posts
    2

    Default Parsing XML files with mutiple "documents"

    I need to parse a log file that contains multiple XML "documents", one per line.
    The file is a repeating set of the same element, but there is no "root" element at the top of the file. Something like:

    <element/>
    <element/>
    <element/>
    <element/>

    etc.

    Is it possible with the streaming xml parser to extract this data?

    Thanks
    -bob

  2. #2
    DEinspanjer Guest

    Default

    What you describe is not a valid well-formed XML document. It is a file containing XML fragments.

    I don't believe any of the parsing steps read XML fragments. I think I remember seeing something in a blog or forum post about some new functionality in Kettle 3.1 that allows you to concatenate XML blobs together such that you could maybe wrap the fragments in a root node turning it into a well-formed document.

    http://www.w3.org/TR/2006/REC-xml-20...ec-well-formed

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The "XML Join" step actually : http://wiki.pentaho.com/display/EAI/XML+Join

  4. #4

    Default XML Store and Forward

    Don't shoot the downstream developer, but we have a file which has an XML message per line similar to the following:
    Code:
    <?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?><Party><EnterprisePartyId>16226428</EnterprisePartyId><PartyType>I</PartyType><Source>ADES</Source><SourceId>P840294SUINHSIEH</SourceId><PartyXREF><PartyAlias><Source>ADES</Source><SourceId>P840294SUINHSIEH</SourceId></PartyAlias><PartyAlias><Source>ADES</Source><SourceId>P840294SUINHSIEH</SourceId></PartyAlias></PartyXREF></Party>
    <?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?><Party><EnterprisePartyId>16226442</EnterprisePartyId><PartyType>I</PartyType><Source>ADES</Source><SourceId>LV88935LETAGRAYSON</SourceId><PartyXREF><PartyAlias><Source>ADES</Source><SourceId>LV88935LETAGRAYSON</SourceId></PartyAlias><PartyAlias><Source>ADES</Source><SourceId>LV88935PETERGRAYSON</SourceId></PartyAlias><PartyAlias><Source>ADES</Source><SourceId>LV88935PETERGRAYSON</SourceId></PartyAlias><PartyAlias><Source>ADES</Source><SourceId>LV88935LETAGRAYSON</SourceId></PartyAlias><PartyAlias><Source>HUON</Source><SourceId>600076602</SourceId></PartyAlias><PartyAlias><Source>HUON</Source><SourceId>647324601</SourceId></PartyAlias></PartyXREF></Party>
    I have created a Jira entry for an issue I am having, but I am assuming that the approach with PDI is correct. I am using a file input and a Get data from XML step in 3.1. I can't really do this in 3.0 since the XML steps expect a file input. Is this the correct approach?

    I would rather have a port reader but the upstream system wants to create the file with the messages. I need to parse the message and store in a database.

    Bill W.
    Last edited by billw; 06-23-2008 at 09:42 PM. Reason: wanted to format the xml

  5. #5
    Join Date
    Dec 2005
    Posts
    531

    Default

    Hi,

    you can use a simple 'text file input'-step to read the xml documents into a field and then use the 'get data from XML'-step, having the 'xml source is defined in a field' option enabled.

    I enclosed a sample with your data.

    Regards,
    Ingo
    Attached Files Attached Files

  6. #6

    Default

    Ingo:

    thank you. that is basically what I had done in testing, but I guess the drop down for field is not working? did you manually type in the field name? I created a Jira record since I was not seeing the field in the drop down and I was getting an error when I manually entered the field name. Alas, you have proved that it works as programmed!

    this is the transformation that I have been looking for, so thank you to the development team on getting this created! I see a lot of work in the platform with XML data streams/files which is great!

    Bill

  7. #7
    Join Date
    Dec 2005
    Posts
    531

    Default

    Hi Bill,

    it works for me ok. What version are you using? I used various 3.1 milestones and they all had no problem to display the available fields in the drop down.

    Glad that it helped anyway.

    Regards,
    Ingo

  8. #8

    Default

    I am using 3.1 M2. I am on MacOS X 10.5.3 (Matt, don't scream) with
    java version "1.5.0_13"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
    Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

    Not sure it is a platform thing or not. I will try in Windows.

    Bill

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    OSX works fine these days.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.