Hitachi Vantara Pentaho Community Forums
Results 1 to 16 of 16

Thread: Firewall dropping idle repository connections to Oracle in Carte

  1. #1
    Join Date
    Apr 2008
    Posts
    11

    Default Firewall dropping idle repository connections to Oracle in Carte

    Hello all:

    We are experiencing errors in repository connections to Oracle when using Carte due to our data center firewall dropping connections after detection of connection inactivity timeouts.

    AFAIK, when performing a remote execution, Carte starts a new database connection to the repository for resolving all needed dependencies it finds as it’s executing the different transformation steps or job entries. If Carte spends too much time executing one of these steps/entries without using the TCP connection to the Oracle repository, our firewall will drop the TCP connection causing an error when Carte finishes the step/entry execution and tries to perform a query using the dropped connection. Unfortunately, neither we can shorten all of those steps/entries execution times, nor extend the firewall connection inactivity timeout period.

    Any idea how to overcome this problem?

    We have thought of these possible approaches:
    • Using some existing database "ping" feature at the Oracle JDBC driver level, provided there is any and it can run in parallel with the transformation/job execution.
    • Using some existing pooling feature of Jakarta Commons DBCP, such as "testWhileIdle", provided this testing is performed in parallel with the transformation/job execution (using its own thread), or between its steps/entries.
    • Modifying Carte so that it runs a database connection pinging thread in parallel with the transformation/job execution.
    • Modifying Carte so that it connects/disconnects to the repository every time it has to query it for resolving dependencies, although this would probably hurt performance.
    • Modifying Carte so that it tries to resolve all the dependencies before starting the actual execution of the root job/transformation (the one being remotely sent). I suppose this is quite different from the current way of resolving dependencies.
    Any comments/suggestions would be greatly appreciated.

    Best regards and thanks in advance,

    Alberto.
    Last edited by aserrano; 11-13-2008 at 04:57 AM. Reason: Making explicit reference to DBCP

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Another option: ask the data center to stop killing your database connections?

    ;-)

  3. #3
    Join Date
    Apr 2008
    Posts
    11

    Default

    Hehe... alas!, they won't settle for less than the current timeout, at least for the moment.

    I'm affraid that PDI is the challenger there, so it has to adapt to current rules and standards. Obviously, I don't mean this is a shortcoming of PDI, but I'm sure it will have to adapt to these sort of constraints.

    Anyway, thanks for you suggestion Matt... you couldn't help it!

    PS: BTW, which one of the aforementioned options seemed more reasonable to you?

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Sorry Alberto, I couldn't help it indeed. :-)
    Well actually, we do support connection pooling and the "testWhileIdle" property. Did you give it a try already?

  5. #5
    Join Date
    Apr 2008
    Posts
    11

    Default

    I’ve tried configuring a pool for the repository connection with the pooling related properties below (I’ve also attached the connection definition taken from the file ~/.kettle/repositories.xml):
    • Initial Pool Size = 2
    • Maximum Pool Size = 2
    • Test While Idle = true
    • Validation Query = SELECT 1 FROM DUAL
    • Time Between Eviction Runs Millis = 5000 ms
    I used two connections for the size of the pool to analyse the behaviour of the eviction thread DBCP uses. In particular, to determine whether testWhileIdle is used on:
    • all connections without any query on execution (inactive from the point of view of database activity).
    • all available connections which have been returned to the pool or never acquired (inactive from the point of view of pool resource usage).
    With the pool configuration above, I expected that the eviction thread would perform the validation query at least on one of the two pooled connections each 5 seconds (timeBetweenEvictionRunMillis). After launching Spoon to test the repository connection, the two configured connections were created as expected (INITIAL_POOL_SIZE). However, no validation queries were actually performed (confirmed using the Oracle JDBC driver java.util.logging based tracing facilities).

    Taking a closer look at the Spoon JVM using Java VisualVM (included in JDK 1.6.0_07 and later), the two instances of org.apache.commons.dbcp.PoolableConnection have a false value for the _testWhileIdle field. It looks like some pooling properties are not being propagated to the DBCP managed connections: testWhileIdle, timeBetweenEvictionRunMillis, etc. I’ve attached a screenshot of a heap dump showing the value of these properties. The same happens when using that repository connection in Carte.


    Would you recommend me to fill a new bug?
    Attached Images Attached Images   
    Attached Files Attached Files
    Last edited by aserrano; 11-13-2008 at 08:14 AM. Reason: Adding detail of the JVisualVM screenshot

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I don't even know if these properties need to be set.
    All properties are being transferred at once using the DriverManagerConnectionFactory Properties. I would guess that what you see are internal runtime variables.

  7. #7
    Join Date
    Apr 2008
    Posts
    11

    Default

    That is the problem, those runtime values don't match the configuration.

    Looking at PDI sources, it seems that all pooling properties are transferred in org.pentaho.di.core.database.ConnectionPoolUtil.createPool and then passed to the DriverManagerConnectionFactory.
    ConnectionFactory cf=new DriverManagerConnectionFactory(url,properties);
    However, DriverManagerConnectionFactory uses those props to pass them directly to the java.sql.DriverManager (see method DriverManagerConnectionFactory.createConnection), which ignores them.

    I think that ConnectionPoolUtil should be modified so that:
    • It sets the rest of gpool's (GenericObjectPool) pooling properties (setTestXXX, setMinEvictableIdelTimeMillis)in the same way it does with:
    gpool.setMaxIdle(-1);
    gpool.setWhenExhaustedAction(GenericObjectPool.WHEN_EXHAUSTED_GROW);
    gpool.setMaxActive(maximumSize);
    • Specifies the validationQuery, defaultReadOnly and defaultAutoCommit parameters when creating the PoolableConnectionFactory, like this:
    new PoolableConnectionFactory(cf, gpool, null, validationQuery, defaultReadOnly, defaultAutoCommit);

    instead of


    new PoolableConnectionFactory(cf, gpool, null, null, false, false);


    All these properties are available via databaseMeta.getConnectionPoolingProperties(), but perhaps they should be used individually as said above instead of propagating them to the DriverManagerConnectionFactory.

    Regards.

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    However, DriverManagerConnectionFactory uses those props to pass them directly to the java.sql.DriverManager (see method DriverManagerConnectionFactory.createConnection), which ignores them.
    That seems like a very silly thing to do by the DBCP project, but sure, if we can solve a problem this way, why not.

    Please file a JIRA case for this and we'll try to get it fixed in 3.2.0.

    Matt

  9. #9
    Join Date
    Apr 2008
    Posts
    11

    Default

    The new issue PDI-1862 has been created.

    Description: Some database connection pooling properties not working as expected (testWhileIdle, timeBetweenEvictionRunMillis, etc.).

    Link to issue in Pentaho BI Platform Tracking:
    http://jira.pentaho.com/browse/PDI-1862

  10. #10
    Join Date
    Apr 2008
    Posts
    11

    Default

    I've posted a solution proposal to the "database connection pooling parameters not working as expected" issue here:

    http://jira.pentaho.com/browse/PDI-1862

    However, although the pool behaves as expected now, the validation query is only used on the idle connections from the pool point of view. Therefore, this approach is not valid for stopping the firewall from dropping connection, because the database repository connection used by Carte once opened is never returned to the pool until a transformation/job finishes. Another approach could be for Carte to perform connects/disconnects to the repository every time it has to query it for resolving dependencies, as said before, but using a pooled connection, so the connect/disconnect operations would not hurt performance.

    Nevertheless, it would be nice to solve this issue at a lower level (TCP transport level), where it belongs, perhaps using some sort of TCP keepalive for the repository connection.

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, I can't say I'm very interested anymore. Messing up PDI just because we have one community member with an overly critical firewall seems a bit over the top.
    Perhaps you should simply consider not using a repository at all. There are other options these days :-)

    On top of that comes the fact that we're going to be supporting the new versioning repository of the Pentaho platform (Alfresco and all) in the near future. I suppose everyone that cares is going to migrate over there anyway.

    All the best,
    Matt

  12. #12
    Join Date
    Apr 2008
    Posts
    11

    Default

    Thanks for the information, Matt. We considered from using exported XML files (cumbersome when having to coordinate different development teams, lacking of at least per-repository security) to Pentaho BI Platform (waiting to evaluate 2.0). Anyway, it’s very interesting to see that PDI is integrating with one of the top open source CMS. We’ll keep an eye on it.

    Regarding PDI-1862, you should consider it for inclusion as it allows sanity check on pooled connections.

    Finally, just in case anyone is interested, this can help when using Oracle JDBC connections with firewalls (refer to Oracle JDBC documentation for further details):
    • Using the aforementioned “testWhile” pool parameter and the patch provided in PDI-1862.
    • Use the oracle.net.READ_TIMEOUT connection property.
    • Use the ENABLE=BROKEN parameter in the connect descriptor DESCRIPTION clause for using a Keepalive TCP connection (if the underlying OS supports it).
    • Enable Oracle Net Dead Connection Detection (DCD) by setting SQLNET.EXPIRE_TIME=1 in the sqlnet.ora file on the database server configuration.

    Best regards.

  13. #13
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    XML is not cumbersome at all, but like any other development file needs to be checked into a versioning system like Subversion (my favorite).
    On top of that, PDI on the whole (like all Pentaho software) supports Apache VFS so that the XML files themselves can be placed anywhere, including in zip files, on web servers, etc.
    Include variables and shared objects support and you have a very flexible and robust system, even with large teams.

    That being said, the integration with a CMS (where the versioning interests me especially) is interesting since it makes optimal re-use of existing software.
    Speaking of VFS, versioning is probably going to run over Apache VFS.

    Something like this would work for me:

    Code:
    username:password@pentahorep:rev736/foo/bar/folder/Some transformation

  14. #14
    Join Date
    Apr 2008
    Posts
    11

    Default

    Without going into more detail, the cumbersome aspects I was talking about when dealing with exported XML files were related to ease of deployment, absolute path handling of references between jobs and transformations and synchronization of development sources across development team members, compared to using Kettle’s database repository.

    Of course, there are many different ways to deal with these (including variables and shared objects as you suggest), but not in such an integrated way as with the proposal you are talking about, from the development, source control and deployment point of view.

    In fact, we were previously using an OpenLDAP+Apache+AuthzSVNAccessFile+Subversion setup for source control _and_ ACL security, including Kettle’s jobs, transformations, properties, etc., using the export + commit approach over a Subversion working copy of the repository. But Kettle’s database repository looked like a more integrated solution for metadata, although it lacks ACL security (AFAIK) and versioning. I guess we should revert to the first approach.

    I must admit I overlooked Apache VFS support in Kettle. It can be very convenient for running jobs from Carte. For development using Spoon, I think that you cannot browse server directories with Apache VFS using the HTTP protocol, neither with WebDAV, which is not entirely supported either by Apache VFS or by Subversion (more robust support targeted for svn 2.0).

    Will the versioning scheme you were talking about support things like these:
    • Dealing with a Subversion local working copy in a way that allows to use atomic commits involving a group of different Kettle elements (a better practice than doing a per-file commit) as well as commit comments from Spoon.
    • Definition of a local repository root path for doing things like metadata search and repository browsing over your local working copy from Spoon.

    In other words, the kind of integration provided by a VCS plugin for an IDE (e.g. Eclipse+Subclipse).

    Best regards and thanks again.

  15. #15
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    This integration is part of our roadmap, coming around the bend actually (2009 timeframe).

    Relative paths are required to be able to move complete ETL solutions around from server to server, even from platform to platform.
    Fortunately, this is also supported.

    Finally, Carte is exclusively used to execute transformations, jobs and clustered transformations, not to store them. Carte does not persist anything in any way.

  16. #16
    Join Date
    Sep 2008
    Posts
    25

    Default

    If you have access to the Oracle server, in the sqlnet.ora file set the SQL_NET.EXPIRE_TIME to a number (1 = 1 minute) lower than your firewall time out.

    This will cause oracle to send a probe back to the oracle client every x number of minutes to ensure the session is still alive. I have used this to bypass similar firewall problems in the past.

    I have always used this with pc's running an oracle client so I am not positive if it will work with a JDBC connection but give both a try.


    -Pete

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.