Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: From dev environment to production

  1. #1
    Join Date
    Jul 2010
    Posts
    6

    Question From dev environment to production

    Hi Folks,

    New with pentaho and PDI I'm struggling to get my environments sorted. So here are the prerequesits:

    We are developing in mixed environment (some on Windows, some on Linux) with Spoon. To share our efforts and learnings we are storing the jobs and transformations in a repository created on a MySQL DB. All this works fine.

    Now I want to deploy a first transformation and job to the production environment. This is a linux box. I got my environment variables sorted to set the right DB-servers, temp file folders and mail servers we need. I got the shared DB connections sorted. So now I have a first transformation (pretty simple, read from table, store to file), that I can execute with pan on the production system.

    For deployment to production I planned to export the transformations and jobs to XML. And here is my problem: In development I am referencing the transformations that are going to be part of the job from the repository with "specify by reference", what do I need to put there to reference the transformation XML file on the production system?
    When I try to execute that job on production it does not find the transformation that I stored there in the same folder as the job, but gets stuck somehow when it tries to execute the transformation, probably because it tries still to connect to the repository?

    I considered a file based deployment to be simpler in the beginning, since I wanted to keep the scheduler as a simple cron entry as "execute all jobs in folder x daily, folder y weekly, folder z monthly". Could it be working this way, using a repository for development and files for production deployment?

    thanks for any advice,
    Reinald

  2. #2
    Join Date
    Jul 2010
    Posts
    6

    Default

    Hi Folks,

    I decided to prepare a DB-based repository for production as well, that seemed to be an easier solution. But, of course, this leaves some questions as well:

    Right now for deployment I use spoon to connect to development repository, export the objects to XML, connect to the production repository, and load the objects to the production repository. Then I start the job on the production server with "nohup ./kitchen ...." and see what happens. Now my questions:

    -- deployment seems to be pretty cumbersome and error prone, is there any "best practice" to make that smarter? Define packages that hold jobs and transformations?
    -- when I have the jobs with a schedule in the "start" step I have to wait until they are executed to see if everything works as planned, all required dependend objects are there, and so on. to be able to test the job without the schedule/execute manually I copy the job to a different name and remove the schedule, so it executes immediately. To do this at the end of the development process and deploy two jobs is as well error prone. Is there any recommended "best practice" to manage this problem? any command line setting to tell kitchen to exeute immediately even with a schedule defined?
    -- is there any way to get the jobs active by deploying them, without having to start each job manually? like "execute everything that is in this folder" what would be easy to achieve with a shell script?

    thanks for every hint,
    Reinald

  3. #3

    Default

    The idea of packaging together certain jobs/transformations together is kind of interesting...especially when you take into consideration some Web Content Management (WCM) solutions use this approach and CMIS/JCR should have similar functionality...which is something I think Matt mentioned PDI may be heading towards anyway.

    However, having transformations shared across multiple jobs does bring challenges, especially if you 'update' a transformation without knowing what other jobs it impacts. So I think manual deployment and review from someone familiar with the environment is still needed (at the moment). The work-around is to have everything versioned (maybe for the time being on the filename basis) so a Job will always talk to MyTrans4 even though you just deployed MyTrans5, 6, and others in the future for other jobs.

    Along those lines, the shared database connections is another area to manage/monitor. I use the multiple-db-repository approach as well, but have the shared connections named the same but pointing to associated DEV->TEST->production database resources. (previously I heavily used JNDI, but with the db-repos I'm falling back to the shared connections).

    Overall, we have the same challenges so I don't think you are doing anything unusual, it's just the nature of disciplined deployment with the tools available :-)


    -Darren

  4. #4

    Default

    I also use separate database repositories for Dev and Production, using jndi exclusively. In my current live version ( 3.2 ) I bundle jobs and transforms in folders in the repository. It is very easy to explore the repository and simply download the xml for the entire folder ( right click). That produced a single xml file that went into subversion until ready for import into the production repository.

    I am currently working on upgrading my DEV system to 4.0.1 and it looks like the feature to bundle and export a directory is gone -- or I just haven't figured it out yet. I'll spend some more time next week.

  5. #5
    Join Date
    Jul 2010
    Posts
    6

    Default

    I have to bring this up again, since I'm still not convinced my method is effective for deployment:

    - export to XML from DEV repository, save it somewhere
    - changing to production repository
    - importing object from XML
    - save to the production repository with manually picking the right path, and create it if required

    am I missing something? is there a command line tool to manage export/import, do it for a couple of objects, retaining the names and not renaming every time I transfer them between DB and Filesystem and vice versa? Or is deployment of projects/modules something that is kept for enterprise customers?

    any hints highly appreciated,
    Reinald

  6. #6

    Default

    Hi,

    This may or may not be the answer to your question but it's a development lifecycle that works for me.

    I develop on a Windows workstation then use the "Export All Linked Resources to XML" to bundle up Jobs in the designer UI. The resultant zip file is put in version control then checked out and deployed (copied) on Linux staging and production environments with a simple Ant build. All Transformations are part of a Job. Jobs are run by kitchen.sh using cron. Within Jobs, I make no references to file-based Jobs, Transformations or Mappings; everything is referenced via repository paths. The export takes care of references to other components.


    HTH,

    Jeff

  7. #7
    Join Date
    May 2010
    Posts
    10

    Default

    Hi Reinald,

    What server configuration you used for production?
    if you can share it, it would be helpful.


    Pawan.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.