Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Spoon (PDI) & DI server - understanding

  1. #1

    Default Spoon (PDI) & DI server - understanding

    Hello everyone,

    Can anyone Explain what does the DI server do, what is it usually used for and when it might be needed! is it required to be installed?
    Can spoon be used without it? what advantages might be lost not using it? is there any Spoon-performance differences between using and not using it?
    what about scheduling the Jobs, is it still possible without using DI server?

    The background of these questions is:
    we installed Pentaho (BA server, Spoon, DI server) - Spoon is using DI repository. Recently we are having some issues with the performance (the server has 8GB RAM- Spoon is set to 4GB and BA/DI Server approx. 2.** GB each).
    so we are thinking to use the option "Kettle database repository" instead of DI repository and use the default-installed Postgre DB to store the ETL metadata. so I would like to know the cons vs pros.

    Moreover, I found this 2 lines speaking about the DI server
    https://help.pentaho.com/Documentation/5.4/0D0/050/020
    " The DI Server runs centrally stored transformations and jobs. The DI Server also hosts the DI repository and processing engine, provides a service layer for security and authentication, and allows scheduling. Manage the DI Server through its related tool, Spoon. "

    and here it is mentioned that increasing spoon memory limit needs increasing the DI Server memory as well --> which is a bit confusing if it is possible to use Spoon without DI server (unless they meat in case using Spoon with DI)
    https://help.pentaho.com/Documentati...H0/070/020/010
    "Note:You need to increase the DI Server memory limit as well."



    Can anyone help
    Thank you

  2. #2
    Join Date
    Nov 2009
    Posts
    688

    Default

    The DI server is a part of the EE edition and not the CE edition. With the DI server you are aible to scedule the running of jobs and transformations

  3. #3

    Default

    Thanks for the answer johanhammik but I think it does a bit more than that, since it is installed with tomcat.
    is there any other option to schedule the job and transformations?

  4. #4
    Join Date
    Aug 2011
    Posts
    360

    Default

    Quote Originally Posted by PenBI View Post
    Thanks for the answer johanhammik but I think it does a bit more than that, since it is installed with tomcat.
    is there any other option to schedule the job and transformations?
    Here is the different components of the DI solution:
    1. Spoon is the DI Designer. So consider it as a graphical IDE to develop your ETL process. It is not meant to execute
    the ETL in production, only for local dev and tests on local workstations.
    However, if you have some Carte instance somewhere, or the DI server, it can also be used to launch job on remote Carte server, or
    manage schedules on DI server.
    2. DI server is J2EE web application (inside a web container, Tomcat) that encompass mutliple stuff:
    - a respository (PDI EE Repository)
    - a security provider (for access on repository)
    - a scheduler, Quartz
    - a Carte servlet: this is a Carte server embeded inside DI server.
    It is used to execute stuff.
    3. Carte is a lightweight web server meant to execute PDI etl.
    4. Kitchen, pan, are command line batch files meant to execute the PDI etl.

    Note that Spoon, Carte, Kitchen and pan comes with the PDI installation.

    So you have different stuff installed on workstations and on production server:
    On workstation:
    You use the PDI installation to get Spoon. You do dev with Spoon and some tests.
    You can use kitchen locally to test complete jobs outside of spoon.
    You connect to a DB/File dev repository or to a dev DI server as repository.

    On production/test/dev server:
    You have 2 possibilities for scheduling and executing jobs:
    1. You have a DI server. Then you have everything (repos, scheduler, Carte instance).
    You use a local Spoon to connect to DI server and schedule stuff or directly launch jobs on Carte.
    2. You dont have DI server. So on server side, you also install the PDI package.
    But, you use Kitchen to execute jobs. Use any scheduler to launch kitchen (like crontab).
    You can also launch a Carte instance on the server, such that you can use your local Spoon to directly execute
    jobs on remote Carte instance. Or use the REST service of Carte to launch jobs.
    Note you'll need to install a DB/File repository on server side.

    Note that on version 7 there is only one server type, the Pentaho Server, that can be used as a BA server or a DI Server.
    Ideally, you should still install it twice, once for BA use, and once for DI use, because you wont
    configure it the same (different plugins, different memory configuration, different lifecycle.)

    Regarding dev to prod life cycle, what you do is:
    1. Have a local dev repository, or a shared dev DI server.
    2. Do you dev and unit tests on local Spoon.
    3. Once everything seems to work, export content to your test server.
    4. Do your QA validation: so you execute all DI solution like it will be in production
    5. If everythings ok, push content to the production server.
    Last edited by Mathias.CH; 11-26-2016 at 12:17 PM.

  5. #5
    Join Date
    Aug 2011
    Posts
    360

    Default

    Finally a note on the difference between executing jobs with DI server/Carte or with
    Kitchen batch:
    With DI server/Carte, you have only one JVM. It is always running, and jobs shares the ressouces of this JVM.
    So you need to allow as much as memory you can to this JVM.
    Moreover, one disadvantage of this is that if a job or a bug eat all ressources, or stuffs stay into memory
    and are not GC, you can have poor performance at some time. The only way to cure it
    is restarting the server regularly (we do it once a day before the nightly batches)
    If working with DI server, you should really have a JVM monitoring in place.


    With kitchen, you create a new JVM each time you launch a job.
    So you should not allow it to use more memory than necessary to run a single job,
    or the other jobs runing in parallele might not be launched or have poor perf.
    The advantage is that you are sure that everything is cleaned at the end of each jobs,
    since the JVM is destroyed each time.

  6. #6

    Default

    Thanks for your explanation Mathias, the concept is now clear.
    How many GBs should the JAVA limit for DI server and Spoon be? is there any best practices?

  7. #7
    Join Date
    Sep 2011
    Posts
    3

    Default

    I have the same doubt PenBI. If anyone could share more information about the best practices it will great.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.