Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: filtering XML data for a specific parent node

  1. #1
    Join Date
    Jan 2015
    Posts
    4

    Default filtering XML data for a specific parent node

    Hello,
    i'm quite a beginner with PDI and i'm now trying to use the Get Data from XML input to retrieve just a few field from some XML files, it's quite simple with the get fields for "simple" nodes but i fail if when i need to extract a specific value of a child when a parent node matches a condition, for instance given the XML below i'd just need to extract the value of CompanyId when the CompanyRole = Custodian

    i tried for instance using Xpath

    AssetDetailsOverviewBasic_Response_1/AssetDetailsOverviewBasicResult/SupportCompany/SupportCompaniesType/[CompanyRole='Custodian']/CompanyType/CompanyId

    and tweaking syntax but always get kettle exceptions, grateful if someone could point to the proper Xpath expression
    thanks in advance
    Paolo

    <s:Body>
    <GetAssetDetailsOverviewBasic_Response_1>
    <AssetDetailsOverviewBasicResult>
    <Identity>
    <Language>English</Language>
    <ShortName>Carmignac Securite A USD acc H</ShortName>
    <NickName/>
    </Identity>
    <SupportCompany>
    <SupportCompaniesType>
    <CompanyRole>Administrator</CompanyRole>
    <CompanyType>
    <Language>English</Language>
    <CompanyId>1283818</CompanyId>
    </CompanyType>
    </SupportCompaniesType>
    <SupportCompaniesType>
    <CompanyRole>Custodian</CompanyRole>
    <CompanyType>
    <Language>English</Language>
    <CompanyId>1272670</CompanyId>
    </CompanyType>
    </SupportCompaniesType>
    </SupportCompany>
    <IPOInfo>
    <LaunchDate>2012-06-19</LaunchDate>
    </IPOInfo>
    </AssetDetailsOverviewBasicResult>
    </GetAssetDetailsOverviewBasic_Response_1>
    </s:Body>
    Last edited by ciampa; 03-13-2019 at 07:17 AM.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,671

    Default

    Probably has to do with CompanyRole and CompanyType being adjacent branches, rather than being Parent/Child.

    You will likely need to chain a few Get Data from XML steps together to get a standard table structure. Then you can filter the rows.

    For Example:

    ---> Incoming Stream
    -> Get Data From XML (Loop Path: /GetAssetDetailsOverviewBasic_Response_1/AssetDetailsOverviewBasicResult )
    . Fields:
    Identity -> Single Node
    Support Company -> Single Node
    -> Get Data From XML (Loop Path: /SupportCompany/SupportCompaniesType )
    . Run Get Fields

    Now you'll have two rows from the content you posted. You can use a Filter Rows step to get only the data you want.
    Last edited by gutlez; 03-12-2019 at 04:22 PM.

  3. #3
    Join Date
    Jan 2015
    Posts
    4

    Default

    Thanks gutlez,
    after more trial and error i was able to get done in one step using Xpath like //SupportCompaniesType[CompanyRole='Custodian']//CompanyId It may have draw-downs but for the small exploration i'm doing seems to work.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.