Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: XPath in Get data from XML tool

  1. #1
    Join Date
    Aug 2011
    Posts
    2

    Default XPath in Get data from XML tool

    I am having the hardest time getting the correct results using XPath. I've tried every variable I can find on the web.

    Loop XPath: /RESPONSE/CONVERSION

    Fields Tab:

    # Name XPath
    1 EXPR ../EXPR
    2 EXCH ../EXCH
    3 DATE DATE
    4 ASK ASK

    Here is the XML:

    - <RESPONSE>
    <EXPR>USD</EXPR>
    <EXCH>GBP</EXCH>
    <AMOUNT>1</AMOUNT>
    <NPRICES>1</NPRICES>
    + <CONVERSION>
    <DATE>Fri, 01 Jun 2001 22:50:00 GMT</DATE>
    <ASK>1.4181</ASK>
    <BID>1.4177</BID>
    </CONVERSION>
    <EXPR>USD</EXPR>
    <EXCH>JPY</EXCH>
    <AMOUNT>1</AMOUNT>
    <NPRICES>1</NPRICES>
    - <CONVERSION>
    <DATE>Fri, 01 Jun 2001 22:50:02 GMT</DATE>
    <ASK>0.008387</ASK>
    <BID>0.008382</BID>
    </CONVERSION>
    <EXPR>USD</EXPR>
    <EXCH>CNY</EXCH>
    <AMOUNT>1</AMOUNT>
    <NPRICES>1</NPRICES>
    - <CONVERSION>
    <DATE>Fri, 01 Jun 2001 09:13:29 GMT</DATE>
    <ASK>0.1208</ASK>
    <BID>0.1208</BID>
    </CONVERSION>
    <EXPR>USD</EXPR>
    <EXCH>QAR</EXCH>
    <AMOUNT>1</AMOUNT>
    <NPRICES>1</NPRICES>
    - <CONVERSION>
    <DATE>Sat, 02 Jun 2001 21:00:00 GMT</DATE>
    <ASK>0.2747</ASK>
    <BID>0.2746</BID>
    </CONVERSION>
    <EXPR>USD</EXPR>
    <EXCH>KRW</EXCH>
    <AMOUNT>1</AMOUNT>
    <NPRICES>1</NPRICES>
    - <CONVERSION>
    <DATE>Fri, 01 Jun 2001 13:11:43 GMT</DATE>
    <ASK>0.0007800</ASK>
    <BID>0.0007776</BID>
    </CONVERSION>
    </RESPONSE>


    My output looks like this:

    EXPR EXCH DATE ASK
    USD GBP 6/1/2001 1.4181
    USD GBP 6/1/2001 0.008387
    USD GBP 6/1/2001 0.1208
    USD GBP 6/2/2001 0.2747
    USD GBP 6/1/2001 0.0007800

    My EXCH column should be changing. I am not an XPath guru, so any help would be greatly appreciated!

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The XML is hard to parse in the sense that EXCH is not defined by path but by occurrence right before a CONVERSION block.

    Perhaps the easiest thing to do is read the file multiple times and then to join it by the row number...

    positional element in XML.ktr

    I tried to use the position() element as in /RESPONSE/EXPR[position()] but that didn't work for some reason.

    You can also try the new XML streaming StAX parser (recent 4.2.0 build):

    positional-stax.zip

    I think I can even get rid of the Java Script in there but I'm lacking the time right now ;-)

    Enjoy,

    Matt
    Last edited by MattCasters; 08-12-2011 at 11:59 AM.

  3. #3
    Join Date
    Nov 2008
    Posts
    777

    Default

    The positional aspects of the EXPR and EXCH relative to the CONVERSION elements are not being captured by the XPath expressions. Essentially, you are getting ../EXPR[1] and ../EXCH[1] in every row.

    It may work to do the processing in two steps. In the first step, the Loop XPath would be /RESPONSE and the field paths would be EXPR and EXCH. In the second step, the Loop XPath would be /RESPONSE/CONVERSION as you have indicated above and the only field paths needed would be DATE and ASK. Then merge the output of the two steps together based on the row number or a counter.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    XPath can handle repeating element groups lacking an enveloping element, for sure:

    EXPR --> preceding-sibling::*[4]
    EXCH --> preceding-sibling::*[3]
    Attached Files Attached Files
    Last edited by marabu; 12-03-2012 at 01:50 PM. Reason: ktr attached
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Sep 2012
    Posts
    71

    Default Solution using StAX

    Using the StAX parser as Matt suggests below, you can select just those elements that you care about, namely those with a data type of CHARACTERS. Now you have rows that you basically need to denormalize in sets of 4 (EXPR, EXCH, DATE, ASK). To do this you add the row number to the stream (using an Add Sequence step) then use a Calculator to determine a "bucket number" = INT((rownum-1)/4). This will give you a grouping field for a Row Denormaliser step.

    denormalize_flat_xml.ktr

    The attached transformation is a bit lengthy but alot of it is just reworking the data to get it into your desired format. The key steps are the sequence, calculator, and row denormaliser, and the key concept is the bucket for grouping.

    Hope this helps!

    Quote Originally Posted by MattCasters View Post
    The XML is hard to parse in the sense that EXCH is not defined by path but by occurrence right before a CONVERSION block.

    Perhaps the easiest thing to do is read the file multiple times and then to join it by the row number...

    positional element in XML.ktr

    I tried to use the position() element as in /RESPONSE/EXPR[position()] but that didn't work for some reason.

    You can also try the new XML streaming StAX parser (recent 4.2.0 build):

    positional-stax.zip

    I think I can even get rid of the Java Script in there but I'm lacking the time right now ;-)

    Enjoy,

    Matt

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.