I am using kettle 3.2.0. I have gone through various articles on how to process large XML files using kettle, and found the following link to be very useful


Though it explains a lot about processing large XML files, but I have not found the way to be consistent, wherein I am dealing with various types of large XML.

The scenario is that, I have a 600 MB xml file that I wanted to process using kettle. I defined the loop x-path and the prune Xpath as described in the article above. It worked well for repetitive XML nodes, but when I specified the same to access an element in the XML file, which occurred only one in the entire XML file (it was its header), I found that the kettle after reading that one node, kept on processing for something till it ended with out-of-memory error.

After reporting out-of-memory error, I also noticed that the memory acquired is not released by the the kettle application.

This is a very big concern for me, as I would be running kettle in server environment, wherein issues like these would be impossible to handle.

I would greatly appreciate if there are any ways/suggestions that can help me process larger XML files (>600 MB, which would be the actual scenario) with consistency.