Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Need help with Get XML Data

  1. #1
    Join Date
    Oct 2007
    Posts
    10

    Default Need help with Get XML Data

    Hi,
    I'm new to Kettle and can not accomplish what I want with an XML file as input. The sample XML is attached.

    What I need is to create a table that will look like this
    StationName,latitude,longitude,observation_date_local_time,...
    Amqui,48.47,-67.43,2010-07-27T01:00:00.000 EST,...
    Aéroport d'Inukjuak,58.47,-78.08,2010-07-27T01:00:00.000 EST,...

    I understand that I need to define the element tag as a loop. I defined the Loop X Path like this
    /om:ObservationCollection/om:member/om:Observation/om:metadata/set/identification-elements/element

    I defined the fields like this
    NAMe, XPath,Element,Type
    name, name,Attribute, String
    value, value,Attribute, String

    But this create one row per element.
    I would like one row per <identification-elements> and one field per element attribute in the loop.

    Ouf, hope I'm clear....
    How should I manage this?
    thanks
    Steve
    Attached Files Attached Files

  2. #2
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi Steve,

    seems like the attached XML got broken.
    In my "Get data from XML" Step I get the following error message when clicking "get XPath Nodes":
    Error on line 1 of document file:///C:/Programme/Pentaho/data-integration/UTF-8 : The prefix "om" for element "om:ObservationCollection" is not bound. Nested exception: The prefix "om" for element "om:ObservationCollection" is not bound.

    Could you please check the attached file?

    Best regards,

    Tom

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Is there any way to get data into your XML so that you can tie all these rows together?

    Something in the Observation or the set... You can then reference the fields against that, and be able to parse them together.

    Otherwise, you would have to hope that the fields are always in the same order, and use a row flattener.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  4. #4
    Join Date
    Oct 2007
    Posts
    10

    Default

    I'm sorry I sent a bad xml, I tried to simplified it...
    Here is the original file
    http://dd.weatheroffice.ec.gc.ca/obs...20100727_f.xml

    These are data for several meteo station. I need for each station:
    - the station name
    - the latitude and longitude
    - observation_date_local_time
    - climate_station_number
    I can do that with this loop xpath /om:ObservationCollection/om:member/om:Observation/om:metadata/set/identification-elements/element and define fields name and value as attribute

    But this returns 1 row per attribute. I would like to have 1 row per station and each parameter would be a field. Is it possible?

    --
    Perhaps I should open another thread for this....please tell me if I should.
    I would also need the temperature data for each station. But these are on another Loop XPath (/om:ObservationCollection/om:member/om:Observation/om:result/elements/element). So I though that I could run 2 steps (one for the station parameter and another one with the temperature data. But I don't know how I can merge the 2 results at the end.
    I came up with the solution attached. I read the temperature data and get the station name using the element number. Station name will be the key to merge once I run another job to get the station parameter.
    Is it a good solution? I don't know much about XML, is it possible that the order of the node changes so using the element number does not garanty to read the right thing...

    I'm using geoKettle spoon 3.2.0
    thanks for your help,
    steve
    Attached Files Attached Files

  5. #5
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi there,

    I think it might be possible to do all that with a single input step. Check out the attached sample.

    Cheers

    Slawo
    Attached Files Attached Files

  6. #6
    Join Date
    Oct 2007
    Posts
    10

    Default

    Wow!
    Yesterday I started to write a php script to manage that.
    With your demonstration I'm convinced I can integrate geoKettle in our infrastructure. We will also use it to populate our spatial database.

    I will read this several time until I understand ;-))
    om:metadata/*[name()='set']/*[name()='identification-elements']/*[name()='element' and @name='observation_date_local_time']/attribute::value

    Is these "operators" or syntax documented somewhere?
    /*[name()=
    /*[name()='element' and @name
    /attribute::value

    thank you all for your time and help,
    steve

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm sure it's standard XPath 1.0 syntax.

  8. #8
    Join Date
    Sep 2009
    Posts
    810

    Default

    Yep, standard syntax.
    For some reason Kettle/XPath does not like the tag names used in your file when constructing the path. I'm not really sure why, maybe it has to do with the way the tags are nested (ns qualified/non ns qualified). Therefore I used the workaround when specifying the tag names. /*[name()='something'] is equivalent to /something (usually)

    Cheers

    Slawo

    Edit: this page might help in understanding the syntax better: http://www.w3schools.com/xpath/xpath_axes.asp

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.