Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Pdi repository recommendations

  1. #1
    Join Date
    Feb 2011
    Posts
    7

    Default Pdi repository recommendations

    Is there any docs that explains the best setup for repositories for pdi.

    Dev RDBMS, prod in files? Or does it matter?

    What are your guys thoughts on the best setup?

  2. #2
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi,

    I think it would make sense to have DEV and PROD repository setup in the same way (i.e.: both in RDBMS or both in filesystems)
    because sometimes you will have jobs which start transformations. In the entries you will specifiy that the transformation/job to be started is located in a repo-folder (RDBMS) or directory...
    Having DEV and PROD repositories in different storage-types would mean that you would have to change this every time you would deploy changed jobs/transformations from DEV to PROD.

    This would be my approach
    Since I prefer working with databases, I have setup all my repositories in database locations so far.

    Cheers,

    Tom

  3. #3
    Join Date
    Dec 2009
    Posts
    332

    Default

    I totally agree with Tom regarding keeping Dev and Prod on the same repository method.

    As someone with a bias (I prefer the file based respository), here are my thoughts on the pros and cons of the two.

    File based is better than the database repository because:
    Since it uses the local operating system for storage, it generally does not have any bugs
    It is always forward and backward compatible
    It allows us to run multiple versions of the PDI at the same time
    It is one less set of software to maintain and the file system is comparitively more stable than a database
    It uses the local operating system tools to manage files and these tools stay consistent through PDI version changes
    It is easier to script (linux) management functions
    It is absolutely compatible with any and all backup, source control, etc, software

    The database repository is better than file based because (bear with me here):
    It allows greater control during group level development efforts
    It has a better scheduler process
    It allows user-level permissions (this may just be BI-Server related)
    There is on-going development intended to make this method even more functional
    I believe there is a better tool for publishing to database repository or running in slave
    (I am certain there is more to it that these and it would be nice to hear from someone who favors the database repository)

    We have continued to use file repository storage primarily because we are a small shop with limited team/parallel development. My understanding is that the database repository really shines in group development environments.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If it's just one person developing transformations you can use whatever you want, including no repository at all.

    Once you're dealing with a couple of developers you can either go for a database repository or a system without a repository and a VCS like Subversion or git.

    Finally, for larger enterprises we recommend the use of the enterprise repository that has proper fine-grained security, file locking support and revision control.

  5. #5
    Join Date
    Apr 2011
    Posts
    10

    Default

    I have been working on a warehousing project for a few months now. We started with a database repository then moved to a file repository and have finally settled with no repository and are just using job files directly. I would suggest the same. Version control is handled by git and we can control job deployment across environments with Capistrano. The database repository was a HUGE impediment to productivity, repository import/export is super buggy and deployment can't be automated. The file-based repository was better but still gave us problems. So far just using files has been great and works with our company's existing process ( git, hudson, capistrano ).

    2 cents,
    Andy

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm working on a Subversion Kettle Repository plugin as well. After that git is up next ;-)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.