Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Few basic questions on pentaho for newbies

  1. #1

    Default Few basic questions on pentaho for newbies

    Hi All,

    I am new to open source ETL community and had been working extensively with other ETL tools like informatica , Datstage etc.

    For my new job I am doing a feasibility analysis for building data warehouse for the firm. Its a start-up company so being tight on budget I have been asked to explore the open source ETL tools. I have been exploring pentaho PDI for the same and got couple of questions for it. Hoping someone pro with it might be able to help . The idea is to use PDI alone only and reporting is not priority for now.

    1. Other then support what are the feature which EE and CE are differing on. What I understand from comparison is most of the basic features of PDI are there in both EE and CE. Please point out if my comparison is wrong or I am missing on something

    2. How is scheduling and monitoring working in PDI. I understand in CE there is no integration of the scheduler with PDI.

    3. How does deployment process works in PDI ?

    4. Do all developers need to install pentaho server on there local machine to do the development work or there is client also avlaible for CE which can be simply installed on devlopers machine and dev work can be done using that client.

    5. What are the advantages /disadvantages of using file based repository and data base repository in pentaho CE.

    6. Last but not the least how much will it cost (on Avearge) for the most basic license of pentaho EE .


    Any pointers on above are appreciated.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    1. Extra in EE: the enterprise repository (including fine-grained security, revisions of jobs & transformations, etc), scheduling and a few extra steps like JMS. Soon also a logging/monitoring dashboard.
    2. Scheduling in EE is done with a Spoon plugin, talking to the data integration server. In CE you can simply use the operating system scheduler. (cron or at mostly) Monitoring in CE can be done simply by building a report on the logging tables.
    3. That depends on your need but with a repository you can use import/export utilities and without a repository you can use a version control system (Subversion, git, ...)
    4. One central server per lifecycle (dev, test, uat, prod) is plenty but there are companies that deploy complete clusters in fail-over.
    5. A file based repository can be used in combination with Subversion/git. A database repository is perhaps more straightforward and provides better locking, user management.
    6. One address: sales@pentaho.com

  3. #3

    Default

    Thanks Matt for your prompt response. Got some doubts about your answers.

    1. Extra in EE: the enterprise repository (including fine-grained security, revisions of jobs & transformations, etc), scheduling and a few extra steps like JMS. Soon also a logging/monitoring dashboard.
    2. Scheduling in EE is done with a Spoon plugin, talking to the data integration server. In CE you can simply use the operating system scheduler. (cron or at mostly) Monitoring in CE can be done simply by building a report on the logging tables.
    QSN : Can you please point me to the thread or help doc where building a report on logging table is build.
    3. That depends on your need but with a repository you can use import/export utilities and without a repository you can use a version control system (Subversion, git, ...)
    4. One central server per lifecycle (dev, test, uat, prod) is plenty but there are companies that deploy complete clusters in fail-over.
    QSN : What I meant was for each environment say dev/sit , do all the developer/tester need to install the full pentaho PDI on there machine to start working on it. In other tools one server can be connected via client .
    5. A file based repository can be used in combination with Subversion/git. A database repository is perhaps more straightforward and provides better locking, user management.
    QSN : In case of DB repository how is deployment done. Will exporting XML and its related jobs will take care of it or there is some other way to go about it. Did get what you mean by locking and user management.
    6. One address: sales@pentaho.com




    Quote Originally Posted by MattCasters View Post
    1. Extra in EE: the enterprise repository (including fine-grained security, revisions of jobs & transformations, etc), scheduling and a few extra steps like JMS. Soon also a logging/monitoring dashboard.
    2. Scheduling in EE is done with a Spoon plugin, talking to the data integration server. In CE you can simply use the operating system scheduler. (cron or at mostly) Monitoring in CE can be done simply by building a report on the logging tables.
    3. That depends on your need but with a repository you can use import/export utilities and without a repository you can use a version control system (Subversion, git, ...)
    4. One central server per lifecycle (dev, test, uat, prod) is plenty but there are companies that deploy complete clusters in fail-over.
    5. A file based repository can be used in combination with Subversion/git. A database repository is perhaps more straightforward and provides better locking, user management.
    6. One address: sales@pentaho.com

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.