Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: XML/Postgresql performances slowns down while processing

  1. #1

    Default XML/Postgresql performances slowns down while processing

    Hello,


    I am trying to extract data from an XML file and fill a warehouse dimension with insert/update, the performances are greath at the process beginning but slows down dramatically while processing... from 400r/s in xml input and 750r/s in insert/update to respectivly 33r/s and 66r/s when 80% of the rows are done...



    The XML file containts about 110000 elements at the same hierarchy level, with about 5 attributes for each one...

    The test computer is a intel pentium M 1,5Ghz and 1250MB of ram, the memory occupation is about 500MB and for the CPU it's 100%...



    I don't understand

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: XML/Postgresql performances slowns down while processing

    It's a known issue: the algorithm is using a DOM tree and wants to position randomly so this means cycling through the XML file for every request. The further to the end you get, the longer it takes to find the required position.
    Especially very large files suffer from this.

    As usual, we'll find a fix for this.

    There, now you understand ;-)

    Matt

  3. #3

    Default RE: XML/Postgresql performances slowns down while processing

    Then an immediate solution will be to fragment XML file to smaller ones and process them sequentially?
    so the hole processing of all the files will be more rapid than the processing of the big file...

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: XML/Postgresql performances slowns down while processing

    Well, there is a lot of XML reader code that makes use of the XMLHandler.getSubNodeByNr() method.
    This method is basically at fault here.

    So what I did is, I added a Caching system (XMLHandlerCache, XMLHandlerCacheEntry) that caches the 500 most recent parent nodes. In tight loops like in a large XML document with thousands of elements it puts the performance back to linear.

    I just commited this code so you can grab a new kettle.jar from development packages in 5 minutes.

    The memory issue I have to still hunt down. Looking for a decent profiler.

    All the best,
    Matt

  5. #5
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: XML/Postgresql performances slowns down while processing

    Hi Matt,

    A profiler I can recommend is JProfiler: http://www.ej-technologies.com/produ.../overview.html
    It is useful for both memory and CPU profiling.

    Regards,

    Wim

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: XML/Postgresql performances slowns down while processing

    Thanks for the link Wim, but I don't feel like paying €400 as I don't use it THAT much. :-)
    I've installed EclipseProfiler and after some patching it works fine for me.

    At first glance it seems the problem is in the DOM tree itself. (35M for a 6M XML document)
    Perhaps by requesting children from a DOM tree, it's somehow leaking memory.
    Mmmm, maybe we have to switch to another strategy althogether.

    I'll investigate some more later.

    Regards,
    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.