Hitachi Vantara Pentaho Community Forums
Page 1 of 3 123 LastLast
Results 1 to 10 of 22

Thread: PDI 4.0 upgrade from PDI 3.2.4

  1. #1
    Join Date
    Dec 2009
    Posts
    332

    Default PDI 4.0 upgrade from PDI 3.2.4

    We are comparing pdi-ee-3.2.4-GA to pdi-ee-4.0.0-RC1 and would like to know the expected impact of the migration to our code base. There is a large volume of documentation, but so far there appears to be no upgrade information documented.

    (Please ignore the repository and team coding aspects of the upgrade in your responses.)

    Are there any known issues with upgrading from PDI 3.2.4 to PDI 4.0? Specifically, are there any code changes required?

    Will opening and saving a 3.2.4 transform/job in 4.0 (without making any intentional changes), alter the transform/job in such a way as to make it no longer executable under 3.2.4?

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The answer is: no, kettle is not forward compatible.
    However, Kettle should be 100% backward compatible.

  3. #3
    Join Date
    Dec 2009
    Posts
    332

    Default

    I cannot express how much I appreciate backwards compatible upgrades. Thank you, thank you.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    There were many things we could have done with respect to changes in the core engine of Kettle.
    However I think most people felt it was good enough and fast enough. So compared to the 2.5 to 3.0 transition, the 3.x to 4.0 transition will be completely painless.

  5. #5
    Join Date
    Dec 2009
    Posts
    332

    Default jre version?

    What is the preferred jre for testing PDI 4.0RC1?

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    1.5 works but a recent 1.6 version works better (faster).

  7. #7
    Join Date
    Apr 2010
    Posts
    127

    Default

    Quote Originally Posted by MattCasters View Post
    However I think most people felt it was good enough and fast enough.
    Except update/replace maybe.
    If a table has more than a few thousand records (which is exactly when you would use update/replace), it's much faster to truncate and reimport it from scratch. Unless we're missing something.

  8. #8
    Join Date
    Dec 2009
    Posts
    332

    Default

    We cannot truncate and reload because there are generated primary keys which have to be maintained.

    Pretty excited to see if 4.0 will increase our nightly ETL processes as those that now take the longest use the Insert / Update step.

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The upsert problem can be solved in many ways.
    Obviously you have to make sure you created the appropriate indexes for the "Insert/Update" step to work fast enough.

    However, you can also opt not to use it at all. If you rarely do updates (or simply less updates vs inserts), you can simply use a "Table Output" step and configure error handling on it. The rows that error out can simply be updated with an "Update" step. This strategy can be many times faster.

    HTH,
    Matt

  10. #10
    Join Date
    Dec 2009
    Posts
    332

    Default

    Had not considered using the error path for the updates. Thanks for the tip.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.