Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Transform/Job Speed

  1. #1

    Default Transform/Job Speed

    To All:

    I am in the process of upgrading from a old version (3.2) to 4.2 and noticed that many of the transformation/jobs took quite a bit longer to run. On side by side compare on the same box one process took 34 seconds on 3.4 and 1.2 minutes on the 4.2 version.

    while they run different versions of java (one was 1.5 and 4.2 is 1.6) I made a large improvement by doing a quick reorg on the repository DB. Once I did this the 4.2 speed was brought down to 42 seconds. While quite a bit slower it is something I can deal with in most cases.

    Anyone have any other tricks to get 4.2 via the command line to run a bit quicker?


  2. #2
    Join Date
    Jul 2008


    I have the same problem since I use pdi 4.2

  3. #3


    Something with how the works.. while my shell script knowlage stinks the help screen should not be taking 20+ seconds with the only change being the #!/bin/sh to being bash so that it would work at all

    Same results on and Box is not a slow box either nor hard worked

    bash-3.00# time
    -rep = Repository name
    -user = Repository username
    -pass = Repository password
    -job = The name of the job to launch
    -dir = The directory (dont forget the leading /)
    -file = The filename (Job XML) to launch
    -level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
    -logfile = The logging file to write to
    -listdir = List the directories in the repository
    -listjobs = List the jobs in the specified directory
    -listrep = List the available repositories
    -norep = Do not log into the repository
    -version = show the version, revision and build date
    -param = Set a named parameter <NAME>=<VALUE>. For example -param:FOO=bar
    -listparam = List information concerning the defined parameters in the specified job.
    -export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
    -maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
    -maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)

    real 0m32.305s
    user 0m23.011s
    sys 0m2.342s

  4. #4


    Ok root cause found and in my case this works.. A simple dtrace showed me what was going on

    dtrace -n 'syscall:pen*:entry { printf("%s %s",execname,copyinstr(arg0)); }' | grep kettle_4.2 > /tmp/files4

    vs a

    dtrace -n 'syscall:pen*:entry { printf("%s %s",execname,copyinstr(arg0)); }' | grep kettle_3.2 > /tmp/files3

    showed right off the bat what was going on.. all the plugins/libext files were being loaded

    Since I use a windows based system to design the items and a Unix based box to run the jobs/trans jobs I removed the agile plugin and some of the lib ext files I will never use time decreased by 50%

    Word to the wise.. never delete anything without backups and knowing what it will impact!!!! I feel pretty sure I am good by removing the files (I think)

  5. #5
    Join Date
    Jul 2008


    I still have the same problem when I delete few libext or plugin

  6. #6
    Join Date
    Nov 1999


    Repository was a bit slower compared to 3.2.5 but I'm sure we fixed this in 4.2.1

  7. #7


    Ahh ha... I have 4.2.0 Stable which the bug fix is not applied to... I may upgrade but most of my issues seem to be fixed with the below..

    Thanks Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.