Hitachi Vantara Pentaho Community Forums
Results 1 to 45 of 45

Thread: File specifications not checked

  1. #1
    Join Date
    Sep 2005
    Posts
    1,403

    Default File specifications not checked

    Attachment: kettle_repo.ktr I'm trying to create a very simple transformation in spoon 2.2.2 (from the binary Kettle download) in BlackDown Java 1.4.2 on Ubuntu 5.109: I'm reading from a PostgreSQL table, and outputting to a text file.

    The transformation is running and the file is created, but contains no data.

    When I do RMBM and "Check Selected Steps" on the text file output object there is a remark that says "File specifications not checked".

    Does anyone know what "file specifications" these are? I've filled in the file name, and extension, and have tried most of the others to see if I could get that message to go away.

    Does anyone know if this message is the reason the output file stays at 0 bytes?
    Or if the cause is something completely unreleated?

    Thanx!

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Well, it probably just means that somehow you're not sending rows to the step.
    How many rows are in the table?

    Matt

  3. #3
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    According to count() run from psgql, 1965367 rows.

    The command I used, is:
    select count(*) from idisc;
    (it's a PostgreSQL version of the freedb, CDDB alternative, created using the tools at this site: http://asmith.id.au/freedb-better.html )

    But I guess that the next case would be to see what pgsql does with the select statement from the input stage. When running, this stage takes a suspiciously short time; only around half a minute.

    Thanx!

    - Steinar

  4. #4
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    When I try to preview the input step, I get a dialog saying:
    "Sorry, no rows can be found to preview."

    Nothing interesting can be found in the Log View. At least none that I can see:
    2006/03/18 20:45:47 - Spoon - Logging goes to /tmp/spoon.32515.log
    2006/03/18 20:45:48 - DBCache - Loading database cache from file: [/home/username/.kettle/db.cache]
    2006/03/18 20:45:50 - Spoon - Main window is created.
    2006/03/18 20:45:50 - Spoon - Asking for repository
    2006/03/18 20:45:50 - Kettle - Reading repositories XML file: /home/username/.kettle/repositories.xml
    2006/03/18 20:46:16 - be.ibridge.kettle.trans.Trans - Transformation is in preview mode...
    2006/03/18 20:46:16 - be.ibridge.kettle.trans.Trans - Dispatching started for filename [null]
    2006/03/18 20:46:16 - disc.0 - Starting to run...
    2006/03/18 20:46:16 - dummy.0 - Starting to run...
    2006/03/18 20:46:37 - disc.0 - Finished reading query, closing connection.
    2006/03/18 20:46:37 - disc.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0
    2006/03/18 20:46:37 - dummy.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0

    I tried removing the DB.cache file, but that didn't change the behaviour.


    - Steinar

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Hi Steinar,

    Try the database explorer, browse to the "idisc" table and try to get a few thousand rows in a preview window.
    Perhaps we're looking at a JDBC driver issue or something.
    Also, it would be nice if you could at least post the transformation. (File : Export to xml)

    Thanks,
    Matt

  6. #6
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Attempting to view the first 2000 rows of idisc gives me the following error message:

    java.lang.OutOfMemoryError

    java.lang.reflect.InvocationTargetException
    at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:327)
    at org.eclipse.jface.dialogs.ProgressMonitorDialog.run(ProgressMonitorDialog.java:447)
    at be.ibridge.kettle.core.dialog.GetPreviewTableProgressDialog.open(Unknown Source)
    at be.ibridge.kettle.core.dialog.DatabaseExplorerDialog.previewTable(Unknown Source)
    at be.ibridge.kettle.core.dialog.DatabaseExplorerDialog$8.widgetSelected(Unknown Source)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:90)
    at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66)
    at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1021)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:2867)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2572)
    at be.ibridge.kettle.core.dialog.DatabaseExplorerDialog.open(Unknown Source)
    at be.ibridge.kettle.spoon.Spoon.exploreDB(Unknown Source)
    at be.ibridge.kettle.spoon.Spoon$57.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66)
    at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1021)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:2867)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2572)
    at be.ibridge.kettle.spoon.Spoon.readAndDispatch(Unknown Source)
    at be.ibridge.kettle.spoon.Spoon.main(Unknown Source)
    Caused by: java.lang.OutOfMemoryError

    So I guess it's not all that mysterious. I need to up the heap size of the spoon process, right?

    The exported transform is attached, but I guess it isn't all that interesting.

  7. #7
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Upped the maximum heap size from the default 256m to 1024m (the machine I'm running on has 2GB of physical memory). That made me run into a different problem:
    <pre>Couldn't get row from result set
    Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.


    java.lang.reflect.InvocationTargetException: Couldn't find any rows because of an error :be.ibridge.kettle.core.exception.KettleDatabaseException:
    Couldn't get row from result set
    Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.

    at be.ibridge.kettle.core.dialog.GetPreviewTableProgressDialog$1.run(Unknown Source)
    at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:113)
    Caused by: be.ibridge.kettle.core.exception.KettleDatabaseException:
    Couldn't get row from result set
    Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.

    at be.ibridge.kettle.core.database.Database.getRow(Unknown Source)
    at be.ibridge.kettle.core.database.Database.getRows(Unknown Source)
    at be.ibridge.kettle.core.database.Database.getFirstRows(Unknown Source)
    ... 2 more
    Caused by: org.postgresql.util.PSQLException: Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.
    at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:287)
    at org.postgresql.core.Encoding.decode(Encoding.java:182)
    at org.postgresql.core.Encoding.decode(Encoding.java:198)
    at org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:206)
    ... 5 more
    </pre>

    The raw data is supposed to be in either latin1, ascii and utf8, but I guess I have to convert everything to utf8 as outlined in http://asmith.id.au/freedb-unicode.html and store it in a PostgreSQL database created with -E UNICODE

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Hi Steinar,

    Concerning the heap size. I guess you can consider it a PGSQL jdbc bug to consume that much memory.
    Normally, the driver has to only consider (in Kettle&#39;s case) 5000 rows in memory which would never max out the 256M.
    MySQL has a similar "feature" that has still has not been resolved after many years:

    Select * from tab1

    causes an out-of-memory exception with very large tables.
    The worst about it is that in case you have let&#39;s say 10 bilion rows in tab1, there is no solution at the moment...

    Matt

  9. #9
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Hm... now I&#39;ve done a cleanup of the charset issues (it&#39;s all normalized into UTF-8 in the PostgreSQL database).

    I&#39;ve also upped the maximum heap size to 1GB.

    But I still get the out of memory exception when exploring the idisc table, even when I try limiting to previewing the first 100 rows of the table.

    The actual cause of the out of memory exception seems to be lost. Looks like it&#39;s caught by a catch-all clause at the top...? Is there a way to determine the actual cause of the exception?

    Thanx!


    - Steinar

  10. #10
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    I started with a fresh spoon process and tried previewing the first 100 lines of
    the idisc table, while watching the memory usage with top.

    The memory usage for the spoon process skyrocketed straight up to 1g, and then
    I got the caught exception.

    It sure would be interesting to see what caused this. Maybe I should try using
    some sort of memory profiler on spoon while browsing this table...?

    - Steinar

    PS on a side note, I&#39;m not deliberately posting anonymously. I&#39;m waiting for a
    confirmation email

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Hi Steinar,

    I think I found the solution. Earlier versions of the PostgreSQL JDBC driver did NOT support the setFetchSize() method limiting the amount of rows in memory.
    That is why method isFetchSizeSupported() in be.ibridge.kettle.core.database.PostgreSQLDatabaseMeta returns false.

    I will do some tests later tonight and fix this issue. If you have access to the code and you need results faster, you can try to set it to return true and see if it works with the newer driver.

    Thanks for the valuable feedback! It&#39;s always nice to have Kettle improved and reduce the footprint ;-)

    All the best,
    Matt

  12. #12
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Hi Matt!

    Thanx for the quick feedback!

    I think I&#39;ve found another possible cause for problems: I&#39;m running PostgreSQL 8.1
    (it was the newest one available to apt-get in), while the driver, from its name,
    looks like it belongs to PostgreSQL 7.4...?


    - Steinar

  13. #13
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    I tried using the pg74.215.jdbc3.jar with the Quantum 3.0.1 database
    browser eclipse plugin, running in eclipse 3.1.1. And I got the same problem,
    ie. eclipse got an out-of-memory error when attempting to browse the idisc
    table.

    I downloaded the JDBC3 driver for PostgreSQL 8.2 from here:
    http://jdbc.postgresql.org/download.html#jars
    and tried using that one when browsing the idisc table with the eclipse
    quantum plugin. And it still got an out-of-memory error.

    Hm... of course the quantum table viewer doesn&#39;t have the limiter that
    kettle had, so even if it should happen to work, it may be trying to load
    the entire table.

    I&#39;ll investigate further.

    - Steinar

  14. #14
    Join Date
    Nov 1999
    Posts
    459

    Default RE: File specifications not checked

    Hi Steinar,

    I donÂÂ't think this is a topic here, but could you also check if your log level is set to basic and NOT to rowlevel.

    Cheers,
    Jens

  15. #15
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Yeah, I think the problem is that the pg74 JDBC driver doesn&#39;t support the setFetchSize() allowing it to limit the size of the resultsets.
    However, I&#39;m sure that later versions support it. Let&#39;s find out as soon as possible.

    In the near future, we will have to build connection-plugins for every possible database type AND version as a lot of these JDBC drivers are not backward compatible (Oracle, MySQL, ...)

    So, I guess a lot of work remains to be done for Kettle: (my favorite quote) "Things will get worse before they get really bad" ;-)

    Cheers,
    Matt

  16. #16
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Hi Jens!

    The log level setting was Basic (probably the default...? At least nothing I
    can remember having changed).

    - Steinar

  17. #17
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Would it be possible to use PostgreSQL 8.1 JDBC driver downloaded from
    jdbc.postgresql.org with the binary version of Kettle 2.2 that I have?

    Or will I have to build stuff myself?

    Wrt. to different versions of the driver, that you should consider
    including in the future, the jdbc.postgresql.org download site says:

    In general it is best to pick a matching server and driver release,
    but there are some exceptions. Newer driver releases will get
    additional features and bug fixes that you may need. Much care is
    taken to make newer driver releases work against older servers, so
    if you are having a problem and suspect it is the driver&#39;s fault,
    don&#39;t hesitate to try out the latest release on an older server. The
    current development driver is known to work with all server versions
    back to, and including, 7.2.

    ...so I guess just including the latest release (ie. 8.1 or 8.2
    depending on how cutting edge you&#39;ll want to be) would be sufficient
    for you,... maybe...?

    - Steinar

  18. #18
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    > Would it be possible to use PostgreSQL 8.1 JDBC driver downloaded
    > from jdbc.postgresql.org with the binary version of Kettle 2.2 that
    > I have?

    To answer myself here: one way of doing this is to stop spoon, move
    away the pg74.215.jdbc3.jar and move in the
    postgresql-8.1-405.jdbc3.jar downloaded from jdbc.postgresql.org and
    then restart spoon.

    - Steinar

  19. #19
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    >Would it be possible to use PostgreSQL 8.1 JDBC driver downloaded from
    >jdbc.postgresql.org with the binary version of Kettle 2.2 that I have?

    Yes, put it in the libext/ directory and change the .bat file (spoon.bat, pan.bat, chef.bat, kitchen.bat) to include this jar file in the classpath.
    That&#39;s all.

    However, I don&#39;t set the fetch size, so it&#39;s going to default to 0 (bad news) or anything above 0 (good news)

    YMMV.

    Good luck,
    Matt

  20. #20
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    Matt writes:
    > Yeah, I think the problem is that the pg74 JDBC driver doesn&#39;t support
    > the setFetchSize() allowing it to limit the size of the resultsets.
    > However, I&#39;m sure that later versions support it. Let&#39;s find out as
    > soon as possible.

    I tried using the posgresql 8.1 JDBC driver, and it still crashed on
    out of memory when attempting to view the first 100 rows of the
    table.

    However running "select * from idisc limit 100" in the SQL Editor from
    the RMBM on the "freedb" connection symbol, worked fine both with the
    8.1 and 7.4 JDBC drivers.

    So my guess is that the function is still unimplemented.

    The public API for the JDBC driver is here:
    http://jdbc.postgresql.org/documenta...api/index.html

    I couldn&#39;t find the function setFetchSize() in the index there. On
    the other hand I couldn&#39;t find the class org.postgresql.Driver either,
    so I may be looking in the wrong place...?


    - Steinar

  21. #21
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    > I tried using the posgresql 8.1 JDBC driver, and it still crashed on
    > out of memory when attempting to view the first 100 rows of the
    > table.
    [snip!]
    > So my guess is that it&#39;s still not implemented.

    I was probably guessing wrong, see here:
    http://jdbc.postgresql.org/documenta...chsize-example

    Seems to be new with 8.0 on. The example isn&#39;t present in the 7.4
    JDBC docs accessible from here.
    http://jdbc.postgresql.org/documentation/docs.html

    So... I guess what I can do is:
    1. use the 8.1 driver
    2. try building from source and change the isFetchSizeSupported()
    function you mentioned to return true

    - Steinar

  22. #22
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Hi Steinar,

    The setFetchSize() hint is a standard JDBC specification.
    http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#setFetchSize(int)

    Because it was not supported on older PGSQL drivers, I disabled it in Kettle.
    I will enable it later tonight and also upgrade the driver to 8.1.

    In the mean time, download the source code for 2.2.2, change the PGSQL DB meta java code and re-compile using Ant (or in Eclipse).

    Or just wait until tomorrow, then the daily development build will be available ;-)

    Cheers,
    Matt

  23. #23
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    I tried building Kettle from a subversion checkout inside of eclipse
    (using subclipse), and changed the method
    PostgreSQLDatabaseMeta.isFetchSizeSupported() to return true, removed
    the pg74 driver, and added the 8.1 driver.

    I still get an out-of-memory exception when doing a preview of the
    first 100 rows of the idisc table from the Explore dialog.

    I guess I&#39;ll wait for tomorrow&#39;s daily development build...:-)

    - Steinar

  24. #24
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Either that or the PGSQL driver doesn&#39;t consider the hint (which I doubt)
    Not a good sign though :-(

  25. #25
    Join Date
    Nov 1999
    Posts
    459

    Default RE: File specifications not checked

    Below is a hint for debugging what the driver is actually doing.

    [http://archives.postgresql.org/pgsql.../msg00198.php]

    On Fri, 25 Feb 2005, Alan Stange wrote:

    > conn.setAutoCommit(false);
    > st.setFetchDirection(ResultSet.FETCH_FORWARD);
    > st.setFetchSize(1000);
    > st.executeQuery("select * from foo");
    >
    > which would result in queries to the postgresql server like this:
    >
    > DECLARE JDBC_CURS_588 CURSOR FOR select * from foo; fetch forward 1000
    > from JDBC_CURS_588;
    >
    > In the 8.0 driver I don&#39;t see this happening. I see only the "select *
    > from foo" query in the postgresql server log.
    >
    > Is the query result still being sent in batch chunks of 1000 rows, but
    > using a different mechanism?
    >

    Yes, it is still being sent in batches, but it uses the V3 protocol
    ability to partially execute portals. An Execute message takes a
    maximum number of rows to return parameter, which is the fetch size.
    Debugging what the driver is actually doing can be done by adding
    ?loglevel=2 to the URL which will log all the frontend/backend messages
    sent/received.

    Kris Jurka

  26. #26
    Join Date
    Nov 1999
    Posts
    459

    Default RE: File specifications not checked

    Just some other thinks to mention:

    - Do you use also the 8.x Version on server side?

    - The result set must be stored temporarily somewhere at the
    client or server side when fetchSize is set (hold cursor seems to
    be used). Is this the same machine? May be there is a
    mechanism at server side if the memory for server is not enough
    or anything else, the data canÂÂ't be stored to temp so the fetchSize
    is not possible to set....

    HTH,

    Jens

  27. #27
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File specifications not checked

    I&#39;ve tried debugging into PostgreSQLDatabaseMeta.isFetchSizeSupported()
    and then browse the idisc table.

    But I never seem to reach the breakpoint.

    I&#39;m running the debugger on a kettle built from the subversion
    checkout, inside eclipse 3.0.1 on a blackdown Java 1.4 VM.

    - Steinar

  28. #28
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Jens writes:
    > - Do you use also the 8.x Version on server side?

    Yes. PostgreSQL 8.1.

    >- The result set must be stored temporarily somewhere at the
    > client or server side when fetchSize is set (hold cursor seems to be
    > used). Is this the same machine?

    Yes. Both server and client is running on my laptop, which is a linux
    machine (2.6.12) with 2GB of RAM and 5GB of swap space.

    > May be there is a mechanism at server side if the memory for server
    > is not enough or anything else, the data canÂÂ't be stored to temp so
    > the fetchSize is not possible to set....

    I don&#39;t know. Maybe...?

    - Steinar

  29. #29
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Matt writes:
    > Either that or the PGSQL driver doesn&#39;t consider the hint (which I doubt)
    > Not a good sign though :-(

    I debugged some more today. This time I successfully hit the
    breakpoint in PostgreSQLDatabaseMeta.isFetchSizeSupported(). This
    function returns true. But the expression
    (databaseMeta.isFetchSizeSupported() && sel_stmt.getMaxRows()>0)
    in Database.openQuery() is false, because sel_stmt.getMaxRows() is 0.

    The first things it does after the block followed by the if test, is
    to setMaxRows() to the value of rowlimit.

    Perhaps you should test against rowlimit being larger than 0, instead
    of sel_stmt.getMaxRows()?


    The operations I did in spoon to get to the breakpoint, was:
    - Open "Connections" in the treeview
    - RMBM on the connection "freedb" and select "Explore..."
    - In the "Database Explorer on [freedb]" dialog, do RMBM on idisc
    - Select "Preview first 100 rows of [idisc]"

    - Steinar

  30. #30
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    > Perhaps you should test against rowlimit being larger than 0,
    > instead of sel_stmt.getMaxRows()?

    Nah... rowlimit is 0, as well.

    - Steinar

  31. #31
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    However the 2.3.0 version that Matt built today, worked just fine. At least previewing did.

    I guess I haven&#39;t understood subversion updates in subclipse yet...

    Now I just have to see if the transform works.

    Thanx for the good work, Matt!

    - Steinar

  32. #32
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Yesterday evening I had some fun with it.
    The Java JDBC contract says that for setFetchSize(size) : 0 < size < getMaxRows()
    That&#39;s why the limit is in there. Some JDBC driver bork on it if you remove it.
    Anyway, in 2.3.0 I specifically blocked out this requirement for PGSQL.
    The default fetch size for Kettle is 100 but setting it makes no difference.

    It&#39;s a sad story, I wish I had a quick answer.
    Added a limit clause for the preview though.

    Matt

  33. #33
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Thanks but no, for me it doesn't work yet. (PGSQL 8.0 database)

    We'll have to keep looking or move to MySQL ;-)




    All the best,

    Matt

  34. #34
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Last input from me is that my transform (attached) still crashed out
    with out of memory on the input. So I did "Edit step" on the input
    step and noticed that "Limit size" was 0. I thought that this may be
    the "chunking size" so I changed it to 5000.

    And that made the transform run successfully. But it only took the
    first 5000 lines.

    So I guess you&#39;re right in that it still doesn&#39;t work. At least the
    automatic sizing into 5000 records doesn&#39;t work.

    The _explicit_ sizing works now (eg. in table exploration ). It
    didn&#39;t work yesterday.

    - Steinar

  35. #35
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    OK, here is a possible workaround:

    SELECT * FROM idisc LIMIT 1000 OFFSET ?

    Use tableInput with Insert Data from Step.
    Generate a Sequence before that:
    - RowGenerator: 5000 empty rows
    - Generate sequence offset, start 0, increment 1000

    Specify "Execute for each row"

    Perhaps this will chop up the thing in smaller pieces...

    Cheers,
    Matt

  36. #36
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Matt writes:
    > The Java JDBC contract says that for setFetchSize(size) : 0 < size <
    > getMaxRows() That&#39;s why the limit is in there. Some JDBC driver bork
    > on it if you remove it.

    You mean that the test was there to ensure that you don&#39;t set a
    setFetchSize() that is larger than getMaxRows()?

    > Anyway, in 2.3.0 I specifically blocked out this requirement for PGSQL.

    In the if test I was debugging in earlier today?

    > The default fetch size for Kettle is 100 but setting it makes no
    > difference.

    Hm... when I debugged, the value of pstmt.getMaxRows() was 0. Should
    it really have been 1965361? Perhaps the problem is in the PGSQL JDBC
    getMaxRows()?

    What does the "Show size of [idisc]" RMBM entry in the database
    explorer use, because that outputs:
    "Table [idisc] has 1965361 rows."

    - Steinar

  37. #37
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    I meant sel_stmt.getMaxRows(), not pstmt.getMaxRows() (presumably because
    params was 0 on the call to openQuery()...?)

  38. #38
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    >You mean that the test was there to ensure that you don&#39;t set a
    >setFetchSize() that is larger than getMaxRows()?

    Basically, yes. You have no idea of the low quality of certain JDBC drivers :-(((((

    > In the if test I was debugging in earlier today?

    Yes.

    > pstmt.getMaxRows()

    This is the maximum amount of rows that can be pre-fetched.
    http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getMaxRows() :
    "Retrieves the maximum number of rows that a ResultSet object produced by this Statement object can contain. If this limit is exceeded, the excess rows are silently dropped. " The fact that it is 0 worries me as it implies that setFetchSize() will not work.

    I guess we ought to contact the PGSQL JDBC developers really.

    >What does the "Show size of [idisc]" RMBM entry in the database
    >explorer use, because that outputs:
    >"Table [idisc] has 1965361 rows."

    SELECT COUNT(*) FROM idisc

    No problem here because it only retrieves a single row. The work is done on the database.

    Also, I entered the "LIMIT" clause in the getSQLLimitClause(int nrRows) allowing you to preview the PGSQL tables quickly. It&#39;s no longer an Out-Of-Memory test-case.

    Take care,
    Matt

  39. #39
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Matt writes:
    >> pstmt.getMaxRows()

    > This is the maximum amount of rows that can be pre-fetched.
    > http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getMaxRows() :
    > "Retrieves the maximum number of rows that a ResultSet object
    > produced by this Statement object can contain. If this limit is
    > exceeded, the excess rows are silently dropped."
    > The fact that it is 0 worries me as it implies that setFetchSize()
    > will not work.

    According to this link, setFetchSize() is supposed to work:
    http://jdbc.postgresql.org/documenta...chsize-example

    My guess is that PostgreSQL isn&#39;t one of the JDBC implementations that will
    break if setFetchSize() is bigger than getMaxRows().

    That would explain why they aren&#39;t aware of this issue.

    > I guess we ought to contact the PGSQL JDBC developers really.

    It might be worth checking out the 8.2devel driver to see if it has
    any improvements...?

    However, I did a quick search for getMaxRows() in their mailing list:
    http://dir.gmane.org/gmane.comp.db.postgresql.jdbc
    The last mention of getMaxRows() is from 2003 which would seem to
    indicate that nobody reported problems about it.

    I guess I could post something, and link to this thread? Or would you
    prefer to post something yourself?

    > SELECT COUNT(*) FROM idisc

    > No problem here because it only retrieves a single row. The work is
    > done on the database.

    Here&#39;s a possible workaround:
    - Add a getMaxRows() function to DataBaseInterface(), that takes a
    PreparedStatement argument and returns an int
    - In BaseDatabaseMeta, implement this function to just return
    getMaxRows() on the argument
    - In PostgreSQLDatabaseMeta replace it to first call the
    BaseDatabaseMeta.getMaxRows(), and if the return value of that call
    is null, do a query with "select count(*) ..."

    Of course that would be tricky if the select is something more complex
    than a select from a single table.

    The best idea is to get the PGSQL driver fixed.

    - Steinar

  40. #40
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Matt writes:

    >> This is the maximum amount of rows that can be pre-fetched.
    >> http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getMaxRows() :
    >> "Retrieves the maximum number of rows that a ResultSet object
    >> produced by this Statement object can contain. If this limit is
    >> exceeded, the excess rows are silently dropped."

    > The fact that it is 0 worries me as it implies that setFetchSize()
    > will not work.

    Hm... here&#39;s more text from the above URL (getMaxRows() documentation):
    "Returns:
    the current maximum number of rows for a ResultSet object produced
    by this Statement object; zero means there is no limit"

    I take that to mean that if the return of getMaxRows() is 0, the JDBC
    driver in question will not have a problem if you set an arbitrary
    setFetchSize().

    - Steinar

  41. #41
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Attachment: Database.java I tried rewriting the Database.java class to just setFetchSize() to
    Const.FETCH_SIZE if getMaxRows() == 0.

    But that didn&#39;t make any difference. It still crashes out.

    My rewritten file is attached.

    - Steinar

  42. #42
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Yeah, like I said, I hacked this in yesterday evening...
    I say this again: setFetchSize() isn&#39;t working in the MySQL 8.1 driver (or database)

    Did you try the workaround I suggested?
    Don&#39;t have the time to try it myself...

    Matt

  43. #43
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Matt writes:
    > I say this again: setFetchSize() isn&#39;t working in the MySQL 8.1
    > driver (or database)

    Here&#39;s someone that had a problem similar to this on January 5 this
    year:
    http://comments.gmane.org/gmane.comp...sql.jdbc/10716

    In the solution given in that thread, setAutoCommit(false) is called
    on the connection before calling createStatement().

    Perhaps that&#39;s the magic invocation?

    > Did you try the workaround I suggested?
    > Don&#39;t have the time to try it myself...

    No, I missed that. I found it now, and will try it later today.

    Thanx!

    - Steinar

  44. #44
    Join Date
    Mar 2006
    Posts
    14

    Default RE: File specifications not checked

    Attachment: kettle_repo.ktr It worked, at least for my simple transform (see attached xml file).
    One issue is that it&#39;ll keep running until the sequence generator
    reaches its upper limit (I guess. I stopped it before it completed).

    Now I&#39;ll attempt to do some aggregation, using kettle operations
    rather than SQL queries.

    Thanx!


    - Steinar

  45. #45
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File specifications not checked

    Attachment: workaround2.ktr OK, how about something like the workaround in attachement?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.