Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Clarification on XML Input

  1. #1

    Default Clarification on XML Input

    Hi,
    I am having a lot of trouble understanding how to get XML Input to return the following values:

    I have a file that looks like this:

    <postcodeList>
    <state name="XYZ">
    <city name="city1" region="South West">
    <postcode>242042</postcode>
    <postcode>242043</postcode>
    <postcode>242044</postcode>
    </city>
    <city name="city2">
    <postcode>242042</postcode>
    <postcode>242043</postcode>
    </city>
    <city name="city3">
    <postcode>242044</postcode>
    </city>
    </state>

    <state name="ABC">
    <city name="city4">
    <postcode>242042</postcode>
    </city>
    <city name="city5" region="North East">
    <postcode>242042</postcode>
    <postcode>242044</postcode>
    </city>
    <city name="city6" region="South West">
    <postcode>242042</postcode>
    <postcode>242043</postcode>
    <postcode>242044</postcode>
    <postcode>242042</postcode>
    <postcode>242043</postcode>
    <postcode>242044</postcode>
    </city>
    </state>

    </postcodeList>


    I am having great trouble understanding how to get a list of ALL cities in any state. Using XML Input I can get a list of the first state's cities (by putting postcodeList, state and city in the Location grid)

    I can get a list of just states and using the get Fields get a very long list of cities on the same row as the state. None of which is what I'm after which is a listing of all cities in all states.

    However, I cannot even create a second input object to give me the second state listing, nor can I figure out how to get the XML Input object to process ALL states.

    I've tried using the Streaming XML Input and that is even more opaque. The manual is terse to the point of not being informative.

    I've looked at the examples and manual, but they do not address this type of issue.

    I do't know if I am in the minority here- but I find XPath MUCH simpler than the approach taken in XML Input (in so far as I understand it). I've not yet got Streaming XML to work.

    Is there no way I can execute an XPath query and derive the results from there? Alternatively, if you can provide some examples / links to howto's it would help. I've been trying to figure this out for a day or more! Probably on of those silly things where one more bit of information will do the trick.
    Thanks in advance,
    Peter
    Last edited by pdavie; 07-22-2007 at 10:29 PM.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    I have not been playing with the XML steps, so no big help there from me.

    What I do know is that the output of the XML input steps have to be rows. Meaning that you can only have 1 fixed format per row (you can't have multiple cities in 1 row e.g.). This is probably the reason you can only get at the first city.

    I also know both input and output XML steps have a bunch of restrictions on them of which kind of XML they can and can't process. All of the ETL tools I know have XML processing, but all of them have some big restrictions on the kind of files they work with.

    Also have a look at the samples\transformations directory in your kettle installation. There's one example using XML input.

    Regards,
    Sven

  3. #3

    Default re: Clarification on XML Input

    Thanks for the reply Sven,
    I found a partial answer in:

    http://forums.pentaho.org/showthread...ight=XML+Input

    I am trying to get a "standard" list of rows out of the XML. But I find the syntax of the XML Inout pretty arcane. As I mentioned, I find XPath simpler. There was a post with a contribution to XML Input (XPath):

    http://forums.pentaho.org/showthread...ight=XML+Input

    It looked very promising, do you know if it progressed further? There are no follow-ups either in this forum or on the Google side.

    Lastly, you mention there are restrictions on the XML that can be processed, is this documented anywhere?

    Anyway, thanks for responding,
    Peter

    Quote Originally Posted by sboden View Post
    I have not been playing with the XML steps, so no big help there from me.

    What I do know is that the output of the XML input steps have to be rows. Meaning that you can only have 1 fixed format per row (you can't have multiple cities in 1 row e.g.). This is probably the reason you can only get at the first city.

    I also know both input and output XML steps have a bunch of restrictions on them of which kind of XML they can and can't process. All of the ETL tools I know have XML processing, but all of them have some big restrictions on the kind of files they work with.

    Also have a look at the samples\transformations directory in your kettle installation. There's one example using XML input.

    Regards,
    Sven

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    It looked very promising, do you know if it progressed further? There are no follow-ups either in this forum or on the Google side.
    No idea... Samatar can probably shed a light on it.

    Lastly, you mention there are restrictions on the XML that can be processed, is this documented anywhere?
    Not explicitly I think. But it's kind of logical... internally rows are used in PDI and all rows have to be of the same format. So if you have/want to produce XML files with lots of levels, repeating elements and/or optional items it becomes problematic.

    Regards,
    Sven

  5. #5

    Default

    Hi Sven, Peter

    The XML Input Path should be available as plugin in > 2.5.1.
    Peter, please give me you email, i will send you the kettle.jar if you are in 2.50.

    Rgds

    Samatar

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.