Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Kettle and Spoon Memory Usage

  1. #1

    Default Kettle and Spoon Memory Usage

    Hi all,

    I'm currently undertaking a project to build a star schema based data warehouse using PDI (v5.0.1) on a MySQL database (v5.1). I've grouped my source tables (so far) into 5 logical business areas (time, location, sales, people, stock), each with its own etl (see screenshot 1 & 2) to populate a staging area db. If I run each of the master jobs for each area separately (screenshot 1), they will take from 30 to 60 seconds each. However, for purposes of scheduling and audit I looked at calling each of the 5 jobs in a single master job one after the other, this is where I ran into problems with the available memory on my machine being gradually consumed by the Java instance running.

    I am currently developing on my local windows 8 box with 16 Gb of physical ram. I have increased the memory available within spoon.bat and kettle.bat to 12 Gb
    (if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xmx12288m" "-XX:MaxPermSize=256m"). This happens both if I run the job through Spoon and as a batch job using Kettle. I have also removed any performance monitoring within the stream, I am only using 1 javascript node to calculate some date functions and there are no database merge nodes as I am aware they can use a lot of memory resources.

    Has anyone come across this before?

    I haven't yet tested this on a linux server as this is being built for me. Is what I am experiencing here a result of running this on my windows box? If so, is there a way around this? I would like my end result to be the same accross my dev, test and live environments as-well-as being able to run on my localhost as I have a need to work away from the office.

    I was wondering if there is a limit with the levels of embedded jobs and if this would affect memory usage? i.e. Great-Grandparent, Grand-parent, Parent, Child

    Is there a way of flushing the java memory between the jobs called? If so, this would enable me to schedule a single master job.

    Sorry about all the questions, but I have been working on this issue for a while and have exhausted the extent of my knowledge for this subject.

    Many thanks in advance for any help as any pointers would be gratefully received.

    Richard
    Attached Images Attached Images   

  2. #2

    Default

    Hello Richard,


    I know this is very old, but the problem is still present. Did you find a solution? I have a similar issue but i can't tweak my process to make it run well.

    Thanks!
    Last edited by JuanMartinezD; 02-17-2017 at 12:26 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.