Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Larger Transformations Timeout On Repository Save

  1. #1

    Question Larger Transformations Timeout On Repository Save

    I'm using Pentaho 2.5.1, storing repository database in MySQL 5.0 community edition using "huge.cnf" settings. Server is 64bit Linux, Spoon clients access repository remotely via VPN over the Internet.

    I can open all transformations in the repository, but can only save the simpler transformations. When I edit and attempt to save the larger transformations, the save appears to hang forever. According to the dialog, the database connections save ok, the hang appears to happen after the last step in the transformation is saved.

    I was only able to get these larger transformations into the repository by restoring from a database backup (when I tried to the load repository objects from a repository .xml file, it hung every time on the first larger transformation).

    Any ideas what should be tweaked to get around this problem?

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    No idea, it's maybe some kind of deadlock. Make a tracker at http://jira.pentaho.org/browse/PDI

    Regards,
    Sven

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Hi Chris, we use batch processing for the step attributes (because there are so many) to speed up saving over slow connections.
    After the last step has been saved, a batch of insert statements is sent over the network to the remote database. When it appears to stall it's actually sending data.
    As such I'm suspecting a simple case of a snail speed network.

    HTH,

    Matt

  4. #4

    Default

    I let it sputter overnight, and it still didn't "finish" the save. Certain changes were saved, e.g., changes I made to the connection - even though the save never finished.

    I also noticed that about 50% of the time when I first launch spoon and connect to the repository, the connection is not successful. No error message, but when I attempt to open a transformation, I'm looking at the file system, not the repository.

    I have not seen this problem saving to other client repositories across the Internet using VPN, nor on repositories on local LANs.

    We'll report back what we find...regarding the DB, network, and memory allocation to kettle. Any ideas about where we should look, parameters to adjust are most welcome...

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Try it with a local repository (on the same machine on which you run spoon) and you can already exlude network latency, or not.

    Regards,
    Sven

  6. #6
    Join Date
    Jan 2007
    Posts
    29

    Default

    My guess too,chris, is that the inserts on the mysql are just very slow and you have loads of them. If the batch inserts Matt referred to are committed every so many rows, a select count(*) should reveal the progress it's making.

  7. #7

    Default

    It inserted one 9 step transformation in about 12 hours. Server CPU/memory use seems to be minimal. MySQL adminstrator shows user connection as sleeping. Client CPU/memory minimal.

    When I kill the db and kettle loses connection, first error is "Couldn't execute SQL: UNLOCK TABLES." Probably meaningless...

    Kettle tables look ok, all optimized. R_STEP_ATTRIBUTE is large - 335,314 (R_STEP has 508 rows). However, these steps have a huge number of fields and this matches the count in a local copy of the repository which runs well.

    Out of ideas tonight...

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Obviously we can't commit every now and then. We need to keep the repository in a consistent state as much as possible.
    That's why we lock the MySQL tables and perform a rollback when something goes wrong.
    Only when all updates to the repository are done, we perform a commit.

    Chris, make sure to see if there isn't a query timeout at work or something. Set the MySQL JDBC option "enableQueryTimeouts" to "false" just to make sure. (default is "true") Also try setting the "rewriteBatchedStatements" option to "true" to see if it reduces network load.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.