Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: How to improve speed of writing xml files with several xml joins?

  1. #1
    Join Date
    Nov 2015

    Default How to improve speed of writing xml files with several xml joins?

    Can anyone please tell how to improve speed of writing xml files with several xml joins?

    I'm joining 12500 records (AssetID) using 5 xml join steps in Pentaho PDI 4.4.4 to create an xml on this format:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <Asset Valid="Y" Sysid="Company" Serialno="000010065" Lastloc="Somewhere" Groupcode="B20M" GroupDescription="BASKET" Description="20mtr" Assetid="000010065">
    <Length unit="mm">19900</Length>
    <Width unit="mm">910</Width>
    <Height unit="mm">650</Height>
    <MaxGrossWeight unit="Kg">13000</MaxGrossWeight>
    <TareWeight unit="Kg">2730</TareWeight>
    <Payload unit="Kg">10270</Payload>
    <ProofLoad unit="Kg">32500</ProofLoad>
    <Hirestatus>On Hire</Hirestatus>

    For each xml join the time to perform the xml join are doubling and in total it runs for 1 hour, which is far too long compared to the available time-slot.
    Anyone with an idea of how to significantly reduce the runtime of this transformation? slow_xml_join_transformation.ktr

    The transformation looks like this:

    Name:  xml join slow.jpg
Views: 82
Size:  15.8 KB

  2. #2
    Join Date
    Jan 2015


    I faced that situation too and didn't find a "proper" XML way that still performed. What I did instead is use the Token Replacement plugin from the Marketplace. For each repeating XML element (including children that appear exactly once) I paste the entire XML fragment as the text, and put tokens where all the fields are supposed to go. Then I use some data grids for headers and closing tags and Append Streams to order it all and output to file.

    I've rigged up a very simple example using your data (and only some of the fields replaced, it's a lot of typing).


    The solution works for multiple level XML too, but then you should include sequence numbers for child nodes to be able to sort the xml_fragments. It's involved, inflexible and completely devoid of any form of standards, but it's fast.

    Last edited by Isha Lamboo; 02-10-2017 at 12:59 PM. Reason: add sample transformation

  3. #3
    Join Date
    Nov 2015


    Thank you for Feedback Isha! I'll check the Token Replacement Plugin.

    I would appreciate any Example to fully understand your point. ...

  4. #4
    Join Date
    Jan 2015


    I've added examples to my original post.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.