Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: StAX Files processing - GC overhead limit exceeded

  1. #1
    Join Date
    Jan 2018
    Posts
    22

    Default StAX Files processing - GC overhead limit exceeded

    Hi everybody,
    I've searched through several posts but couldn't find a solution to this problem.

    I've got a Job (Job1) that reads from disk the name of all the XML files in a specific input folder.
    This file names are then passed to another Job (Job2) that:
    1. Reads the file (StAX)
    2. Saves the content to a database, whether the content of the XML satisfies specific requirements
    3. Deletes the imported file from disk


    Job2 Executes for every input row, so it runs for every input file.

    The problem is: after several processed files, I receive an out of memory error "GC overhead limit exceeded".

    Gievn the fact that I would like to be able to process thousands of files, how could I set my jobs structure in order to avoid this Memory Error?
    I thought that the job option "Execute for every input row" releases the memory after every single processing run, but it seems it's not the case.

    I'm sorry but I can't post the Job and Transformation files for privacy matters.

    P.S.
    Increasing the spoon memory limit is not a solution for me.

    Any idea?

    Thanks in advance!!!

  2. #2

    Default

    You might try changing the "Number of rows in rowsets" to a lower value in that specific transformation file, as PDI is likely buffering the XML content between steps, which may take up a lot of the available memory.

    https://wiki.pentaho.com/display/EAI...-Miscellaneous

  3. #3
    Join Date
    Jan 2018
    Posts
    22

    Default

    Hi Matthew!
    Many thanks for the reply, the processing of the single file is fine until the #260, after that one, it begins crushing.
    I'm starting to be doubtful that the cause is the file memory occupation, since I've got soemthing like 8000 files, everyone big like 8k, occupying overall 65M.

    I'm starting to wonder if it could be something else, like Pentaho logging/metrics/dunno...

    Any other idea please?

    Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.