Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Batch insert mode disabled because of database limitations on Postgres

  1. #1
    Join Date
    Sep 2008
    Posts
    168

    Default Batch insert mode disabled because of database limitations on Postgres

    Hi,

    I'm building an ETL transformation with Kettle 3.2 against a Postgres database (8.2 / 8.3) In my table output steps I have the option "Use batch update for inserts" activated but whenever I run a transformation the first time the option is not available (grey) and the logfile of the run shows the message
    Insert Trans Log.0 - Batch insert mode disabled because of database limitations.

    Is that normal? What do I have to do to use batch mode?

    Thanks,
    Benjamin

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    It's disabled because you are using error handling.

    The transactional model of Postgres doesn't allow any command to proceed after any type of database error (like a failed insert). So it's not really possible to recover from that and send the succeeded/failed rows in different direction. The only option w/ PostgreSQL is to disable batch mode in that case since you could have an error in the first row of a batch of a 1000. All 1000 rows will fail on PostgreSQL while perhaps only 1 should have.

    It's *quite* a unique situation: for PostgreSQL only!

  3. #3
    Join Date
    Sep 2008
    Posts
    168

    Default

    Hi Matt,

    thanks for your answer. The more I work with Postgres, the more I "love" it.
    Is it a big difference in performance to not have the batch insert mode?

    Benjamin

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You might love it. Their non-standard transactional model made it a pain to implement the error handling code.
    The performance drop is 15-30%.

    If you are planning to load a lot of data, consider the bulk loader. It's at least twice as fast. (3-4 times faster usually) Be careful, here too, PostgreSQL is non-standard by insisting on clunky security measures. http://wiki.pentaho.com/display/EAI/...QL+Bulk+Loader

    The performance of that bulk loader is among the fastest I've seen so far on PDI.

  5. #5
    Join Date
    Sep 2008
    Posts
    168

    Default

    I finally got to try the bulk loader for Postgres. It took me a little bit to figure it out but now it works. On my local machine it was over 4x faster than the normal table input step (read a text file, load it into an empty table).

    I would suggest to add a input field for the database user name, right now you have to write -U <username> and also to use the same name and order of the separator + delimiter fields as you do for the csv file input step. Right now the naming is different which makes is harder to understand.

    Thanks again for pointing me to that step.

    Benjamin



    PS: The "loving it" was ironic, i rather work with MySQL :-)

  6. #6
    Join Date
    Mar 2008
    Posts
    101

    Default

    Is the postgreSQL bulk loader expecting the input rows to be in a certain format?
    When I tried to run this, I got the following error:

    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Error in step
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Error serializing rows of data to the psql command
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Broken pipe
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) :
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:418)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.processRow(PGBulkLoader.java:287)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.run(PGBulkLoader.java:460)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Caused by: java.io.IOException: Broken pipe
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at java.io.FileOutputStream.writeBytes(Native Method)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at java.io.FileOutputStream.write(FileOutputStream.java:260)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:333)
    2009/08/17 15:45:46 - PostgreSQL Bulk Loader.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : ... 2 more
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Unexpected error :
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Couldn't get row from result set
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : ERROR: canceling statement due to user request
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) :
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2869)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2784)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.steps.tableinput.TableInput.processRow(TableInput.java:137)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.step.BaseStep.runStepThread(BaseStep.java:2889)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.trans.steps.tableinput.TableInput.run(TableInput.java:345)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to user request
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:1527)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.postgresql.jdbc2.AbstractJdbc2ResultSet.next(AbstractJdbc2ResultSet.java:1843)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2799)
    2009/08/17 15:45:46 - Select delivery item entries.0 - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : ... 4 more
    2009/08/17 15:45:46 - Daily Third Party Placement Delivery Entry Fact - ERROR (version 3.2.0-GA, build 10572 from 2009-05-12 08.45.26 by buildguy) : Errors detected!
    Thanks,

    Peter
    Last edited by Peter Schmidt; 08-17-2009 at 04:46 PM.

  7. #7

    Default

    Quote Originally Posted by MattCasters View Post
    You might love it. Their non-standard transactional model made it a pain to implement the error handling code.
    The performance drop is 15-30%.

    If you are planning to load a lot of data, consider the bulk loader. It's at least twice as fast. (3-4 times faster usually) Be careful, here too, PostgreSQL is non-standard by insisting on clunky security measures. http://wiki.pentaho.com/display/EAI/...QL+Bulk+Loader

    The performance of that bulk loader is among the fastest I've seen so far on PDI.

    Is this problem still alive???
    In my etl, I have a generic database connection (let's say sometimes SQL Server and sometimes PostgreSQL), I'm inserting data (using Table Output component), when the exception occurs, I'm trying to update it (Update component). It works well when the destination is SQL Server. But it doesn't work properly when the destination is PostgreSQL.

    Note: The problem occurs even if the batch updates are turned off
    Last edited by MuthaiahPalaniappan; 12-12-2017 at 06:55 PM. Reason: Brief Explanation

  8. #8
    Join Date
    Nov 2013
    Posts
    382

    Default

    We use Postgress and inserts in PDI on a regular basis. As far as I am aware of, you can use error handling on table inserts and they will work as expected ... with thousands of job executions, we never failed to get the error correctly processed. Even with a 1000 (or whatever number you like) rows commit, a duplicate row will be isolated and detected.

    i have often wondered about the error message ... I guess in some rare circumstances it's real, but never happened to us.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.