Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Parsing XML with multiple equally-named items on one level

  1. #1
    Join Date
    Aug 2008
    Posts
    3

    Post Parsing XML with multiple equally-named items on one level

    Hi there.

    I just started playing around with Kettle. I'm trying to parse a XML file that has multiple elements with the same name (but different attributes) on one level. Attached is an example XML file to better understand my problem.
    Let's say I want to get the values of level3-elements that have id "2", one from every level2-element. So I would expect to get the following values: 1.5, 4.5, 7.5

    Currently I use the following settings for my XML Input object in Kettle:
    In Content-Tab:
    - Location #1: levelOne
    - Localtion #2: levelTwo (to get the repeating element)

    In Fields-Tab:
    - #1: Name = Result | Position = E=level3/2,A=value/1 (The first value of the SECOND level3-element)

    This works so far because the structure of the XML file is fixed. But what happens if I get another level3-element for every level2-element BEFORE all existing elements? Then I would get the value of the element that is now first and would then be the second one. Anybody understood the problem? ;-) I can't address a special element identified by its id (an attribute), only identified by its POSITION.

    Is there any way to address exactly this element that has id "2" instead of going to the second element and hope that the structure of the file doesn't change?

    Something like XPath would be great, but that is - according to forum entries - only available starting with version 3.1. Unfortunately I'm forced to work with 3.0.4. I also looked for a plugin, but the one that might help is just available for version 2.X :-(

    Is there any other possibility to get what I want? Thanks in advance!

    Greetings,
    --
    André
    Attached Files Attached Files

  2. #2
    Join Date
    Aug 2008
    Posts
    3

    Default

    Nobody got an idea what to do in this case? :-(

  3. #3
    DEinspanjer Guest

    Default

    Since you are limited to 3.0.4, it is a somewhat tough.

    Just for the heck of it, I tried upgrading the Rhino js.jar Javascript engine that Kettle uses to the latest version and wrote a demonstration of how to get the data you requested out of the Javascript step using E4X scripting. That example is attached below.

    In order to run it, you need to download Rhino 1.7R1 from http://www.mozilla.org/rhino/ and copy js.jar into the libext directory of your Kettle installation.
    Attached Files Attached Files

  4. #4
    Join Date
    Aug 2008
    Posts
    3

    Default

    Hi and thanks for your effort and the reply!
    I did what you wrote: downloaded Rhino, replaced the original js.jar and played around a bit with your transformation. I also adapted the path to the example.xml file in the javascript, but when clicking on "get variables", I get
    Code:
    General error executing script:
    FAILED ASSERTION
    When trying it in compatibility mode of the script step, that is what I get:
    Code:
    General error executing script:
    TypeError: Cannot find function indexOf in object test value test value test value test value test value test value test value test value test value test value                                                                                                                                                                                                                                                                                                                                                                                                       . (script#6)
    You know what is the problem here?

    Thanks and greetings,
    --
    André

  5. #5
    DEinspanjer Guest

    Default

    Due to some of the javascript code I am using to create new rows, you are going to have trouble with the get variables button. You don't need it though, you can populate the table directly based on what you are putting into the row.

    The error about "test value" is just an artifact of clicking the test button. That is why I had the block of code in there that tested to see if the value was "test value" so that it could initialize the value to something reasonable.

  6. #6
    DEinspanjer Guest

    Default

    Quote Originally Posted by DEinspanjer View Post
    Since you are limited to 3.0.4, it is a somewhat tough.

    Just for the heck of it, I tried upgrading the Rhino js.jar Javascript engine that Kettle uses to the latest version and wrote a demonstration of how to get the data you requested out of the Javascript step using E4X scripting. That example is attached below.

    In order to run it, you need to download Rhino 1.7R1 from http://www.mozilla.org/rhino/ and copy js.jar into the libext directory of your Kettle installation.
    Rhino 1.7 should be included in Kettle 3.1-GA unless someone surprises me and finds something terrible wrong with it.

    Attached is another example showing some of the expressive power of creating and modifying XML with it inside Kettle.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.