PDA

View Full Version : File specifications not checked



kettle_anonymous
03-17-2006, 05:54 AM
Attachment: kettle_repo.ktr (http://forums.pentaho.org/archived_att/kettle/kettle_repo.ktr) I'm trying to create a very simple transformation in spoon 2.2.2 (from the binary Kettle download) in BlackDown Java 1.4.2 on Ubuntu 5.109: I'm reading from a PostgreSQL table, and outputting to a text file.

The transformation is running and the file is created, but contains no data.

When I do RMBM and "Check Selected Steps" on the text file output object there is a remark that says "File specifications not checked".

Does anyone know what "file specifications" these are? I've filled in the file name, and extension, and have tried most of the others to see if I could get that message to go away.

Does anyone know if this message is the reason the output file stays at 0 bytes?
Or if the cause is something completely unreleated?

Thanx!

MattCasters
03-17-2006, 06:12 AM
Well, it probably just means that somehow you're not sending rows to the step.
How many rows are in the table?

Matt

kettle_anonymous
03-18-2006, 10:03 AM
According to count() run from psgql, 1965367 rows.

The command I used, is:
select count(*) from idisc;
(it's a PostgreSQL version of the freedb, CDDB alternative, created using the tools at this site: http://asmith.id.au/freedb-better.html )

But I guess that the next case would be to see what pgsql does with the select statement from the input stage. When running, this stage takes a suspiciously short time; only around half a minute.

Thanx!

- Steinar

kettle_anonymous
03-18-2006, 10:51 AM
When I try to preview the input step, I get a dialog saying:
"Sorry, no rows can be found to preview."

Nothing interesting can be found in the Log View. At least none that I can see:
2006/03/18 20:45:47 - Spoon - Logging goes to /tmp/spoon.32515.log
2006/03/18 20:45:48 - DBCache - Loading database cache from file: [/home/username/.kettle/db.cache]
2006/03/18 20:45:50 - Spoon - Main window is created.
2006/03/18 20:45:50 - Spoon - Asking for repository
2006/03/18 20:45:50 - Kettle - Reading repositories XML file: /home/username/.kettle/repositories.xml
2006/03/18 20:46:16 - be.ibridge.kettle.trans.Trans - Transformation is in preview mode...
2006/03/18 20:46:16 - be.ibridge.kettle.trans.Trans - Dispatching started for filename [null]
2006/03/18 20:46:16 - disc.0 - Starting to run...
2006/03/18 20:46:16 - dummy.0 - Starting to run...
2006/03/18 20:46:37 - disc.0 - Finished reading query, closing connection.
2006/03/18 20:46:37 - disc.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0
2006/03/18 20:46:37 - dummy.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0

I tried removing the DB.cache file, but that didn't change the behaviour.


- Steinar

MattCasters
03-18-2006, 11:00 AM
Hi Steinar,

Try the database explorer, browse to the "idisc" table and try to get a few thousand rows in a preview window.
Perhaps we're looking at a JDBC driver issue or something.
Also, it would be nice if you could at least post the transformation. (File : Export to xml)

Thanks,
Matt

kettle_anonymous
03-19-2006, 08:16 AM
Attempting to view the first 2000 rows of idisc gives me the following error message:

java.lang.OutOfMemoryError

java.lang.reflect.InvocationTargetException
at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:327)
at org.eclipse.jface.dialogs.ProgressMonitorDialog.run(ProgressMonitorDialog.java:447)
at be.ibridge.kettle.core.dialog.GetPreviewTableProgressDialog.open(Unknown Source)
at be.ibridge.kettle.core.dialog.DatabaseExplorerDialog.previewTable(Unknown Source)
at be.ibridge.kettle.core.dialog.DatabaseExplorerDialog$8.widgetSelected(Unknown Source)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:90)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1021)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:2867)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2572)
at be.ibridge.kettle.core.dialog.DatabaseExplorerDialog.open(Unknown Source)
at be.ibridge.kettle.spoon.Spoon.exploreDB(Unknown Source)
at be.ibridge.kettle.spoon.Spoon$57.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1021)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:2867)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2572)
at be.ibridge.kettle.spoon.Spoon.readAndDispatch(Unknown Source)
at be.ibridge.kettle.spoon.Spoon.main(Unknown Source)
Caused by: java.lang.OutOfMemoryError

So I guess it's not all that mysterious. I need to up the heap size of the spoon process, right?

The exported transform is attached, but I guess it isn't all that interesting.

kettle_anonymous
03-19-2006, 08:27 AM
Upped the maximum heap size from the default 256m to 1024m (the machine I'm running on has 2GB of physical memory). That made me run into a different problem:
<pre>Couldn't get row from result set
Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.


java.lang.reflect.InvocationTargetException: Couldn't find any rows because of an error :be.ibridge.kettle.core.exception.KettleDatabaseException:
Couldn't get row from result set
Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.

at be.ibridge.kettle.core.dialog.GetPreviewTableProgressDialog$1.run(Unknown Source)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:113)
Caused by: be.ibridge.kettle.core.exception.KettleDatabaseException:
Couldn't get row from result set
Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.

at be.ibridge.kettle.core.database.Database.getRow(Unknown Source)
at be.ibridge.kettle.core.database.Database.getRows(Unknown Source)
at be.ibridge.kettle.core.database.Database.getFirstRows(Unknown Source)
... 2 more
Caused by: org.postgresql.util.PSQLException: Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.
at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:287)
at org.postgresql.core.Encoding.decode(Encoding.java:182)
at org.postgresql.core.Encoding.decode(Encoding.java:198)
at org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:206)
... 5 more
</pre>

The raw data is supposed to be in either latin1, ascii and utf8, but I guess I have to convert everything to utf8 as outlined in http://asmith.id.au/freedb-unicode.html and store it in a PostgreSQL database created with -E UNICODE

MattCasters
03-19-2006, 10:42 AM
Hi Steinar,

Concerning the heap size. I guess you can consider it a PGSQL jdbc bug to consume that much memory.
Normally, the driver has to only consider (in Kettle&#39;s case) 5000 rows in memory which would never max out the 256M.
MySQL has a similar "feature" that has still has not been resolved after many years:

Select * from tab1

causes an out-of-memory exception with very large tables.
The worst about it is that in case you have let&#39;s say 10 bilion rows in tab1, there is no solution at the moment...

Matt

kettle_anonymous
03-19-2006, 11:56 PM
Hm... now I&#39;ve done a cleanup of the charset issues (it&#39;s all normalized into UTF-8 in the PostgreSQL database).

I&#39;ve also upped the maximum heap size to 1GB.

But I still get the out of memory exception when exploring the idisc table, even when I try limiting to previewing the first 100 rows of the table.

The actual cause of the out of memory exception seems to be lost. Looks like it&#39;s caught by a catch-all clause at the top...? Is there a way to determine the actual cause of the exception?

Thanx!


- Steinar

kettle_anonymous
03-20-2006, 12:10 AM
I started with a fresh spoon process and tried previewing the first 100 lines of
the idisc table, while watching the memory usage with top.

The memory usage for the spoon process skyrocketed straight up to 1g, and then
I got the caught exception.

It sure would be interesting to see what caused this. Maybe I should try using
some sort of memory profiler on spoon while browsing this table...?

- Steinar

PS on a side note, I&#39;m not deliberately posting anonymously. I&#39;m waiting for a
confirmation email

MattCasters
03-20-2006, 12:27 AM
Hi Steinar,

I think I found the solution. Earlier versions of the PostgreSQL JDBC driver did NOT support the setFetchSize() method limiting the amount of rows in memory.
That is why method isFetchSizeSupported() in be.ibridge.kettle.core.database.PostgreSQLDatabaseMeta returns false.

I will do some tests later tonight and fix this issue. If you have access to the code and you need results faster, you can try to set it to return true and see if it works with the newer driver.

Thanks for the valuable feedback! It&#39;s always nice to have Kettle improved and reduce the footprint ;-)

All the best,
Matt

kettle_anonymous
03-20-2006, 01:57 AM
Hi Matt!

Thanx for the quick feedback!

I think I&#39;ve found another possible cause for problems: I&#39;m running PostgreSQL 8.1
(it was the newest one available to apt-get in), while the driver, from its name,
looks like it belongs to PostgreSQL 7.4...?


- Steinar

kettle_anonymous
03-20-2006, 02:19 AM
I tried using the pg74.215.jdbc3.jar with the Quantum 3.0.1 database
browser eclipse plugin, running in eclipse 3.1.1. And I got the same problem,
ie. eclipse got an out-of-memory error when attempting to browse the idisc
table.

I downloaded the JDBC3 driver for PostgreSQL 8.2 from here:
http://jdbc.postgresql.org/download.html#jars
and tried using that one when browsing the idisc table with the eclipse
quantum plugin. And it still got an out-of-memory error.

Hm... of course the quantum table viewer doesn&#39;t have the limiter that
kettle had, so even if it should happen to work, it may be trying to load
the entire table.

I&#39;ll investigate further.

- Steinar

jbleuel
03-20-2006, 02:24 AM
Hi Steinar,

I donÂÂ't think this is a topic here, but could you also check if your log level is set to basic and NOT to rowlevel.

Cheers,
Jens

MattCasters
03-20-2006, 02:27 AM
Yeah, I think the problem is that the pg74 JDBC driver doesn&#39;t support the setFetchSize() allowing it to limit the size of the resultsets.
However, I&#39;m sure that later versions support it. Let&#39;s find out as soon as possible.

In the near future, we will have to build connection-plugins for every possible database type AND version as a lot of these JDBC drivers are not backward compatible (Oracle, MySQL, ...)

So, I guess a lot of work remains to be done for Kettle: (my favorite quote) "Things will get worse before they get really bad" ;-)

Cheers,
Matt

kettle_anonymous
03-20-2006, 03:49 AM
Hi Jens!

The log level setting was Basic (probably the default...? At least nothing I
can remember having changed).

- Steinar

kettle_anonymous
03-20-2006, 03:58 AM
Would it be possible to use PostgreSQL 8.1 JDBC driver downloaded from
jdbc.postgresql.org with the binary version of Kettle 2.2 that I have?

Or will I have to build stuff myself?

Wrt. to different versions of the driver, that you should consider
including in the future, the jdbc.postgresql.org download site says:

In general it is best to pick a matching server and driver release,
but there are some exceptions. Newer driver releases will get
additional features and bug fixes that you may need. Much care is
taken to make newer driver releases work against older servers, so
if you are having a problem and suspect it is the driver&#39;s fault,
don&#39;t hesitate to try out the latest release on an older server. The
current development driver is known to work with all server versions
back to, and including, 7.2.

...so I guess just including the latest release (ie. 8.1 or 8.2
depending on how cutting edge you&#39;ll want to be) would be sufficient
for you,... maybe...?

- Steinar

kettle_anonymous
03-20-2006, 04:18 AM
> Would it be possible to use PostgreSQL 8.1 JDBC driver downloaded
> from jdbc.postgresql.org with the binary version of Kettle 2.2 that
> I have?

To answer myself here: one way of doing this is to stop spoon, move
away the pg74.215.jdbc3.jar and move in the
postgresql-8.1-405.jdbc3.jar downloaded from jdbc.postgresql.org and
then restart spoon.

- Steinar

MattCasters
03-20-2006, 04:21 AM
>Would it be possible to use PostgreSQL 8.1 JDBC driver downloaded from
>jdbc.postgresql.org with the binary version of Kettle 2.2 that I have?

Yes, put it in the libext/ directory and change the .bat file (spoon.bat, pan.bat, chef.bat, kitchen.bat) to include this jar file in the classpath.
That&#39;s all.

However, I don&#39;t set the fetch size, so it&#39;s going to default to 0 (bad news) or anything above 0 (good news)

YMMV.

Good luck,
Matt

kettle_anonymous
03-20-2006, 04:33 AM
Matt writes:
> Yeah, I think the problem is that the pg74 JDBC driver doesn&#39;t support
> the setFetchSize() allowing it to limit the size of the resultsets.
> However, I&#39;m sure that later versions support it. Let&#39;s find out as
> soon as possible.

I tried using the posgresql 8.1 JDBC driver, and it still crashed on
out of memory when attempting to view the first 100 rows of the
table.

However running "select * from idisc limit 100" in the SQL Editor from
the RMBM on the "freedb" connection symbol, worked fine both with the
8.1 and 7.4 JDBC drivers.

So my guess is that the function is still unimplemented.

The public API for the JDBC driver is here:
http://jdbc.postgresql.org/documentation/publicapi/index.html

I couldn&#39;t find the function setFetchSize() in the index there. On
the other hand I couldn&#39;t find the class org.postgresql.Driver either,
so I may be looking in the wrong place...?


- Steinar

kettle_anonymous
03-20-2006, 04:45 AM
> I tried using the posgresql 8.1 JDBC driver, and it still crashed on
> out of memory when attempting to view the first 100 rows of the
> table.
[snip!]
> So my guess is that it&#39;s still not implemented.

I was probably guessing wrong, see here:
http://jdbc.postgresql.org/documentation/81/query.html#fetchsize-example

Seems to be new with 8.0 on. The example isn&#39;t present in the 7.4
JDBC docs accessible from here.
http://jdbc.postgresql.org/documentation/docs.html

So... I guess what I can do is:
1. use the 8.1 driver
2. try building from source and change the isFetchSizeSupported()
function you mentioned to return true

- Steinar

MattCasters
03-20-2006, 04:58 AM
Hi Steinar,

The setFetchSize() hint is a standard JDBC specification.
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#setFetchSize(int)

Because it was not supported on older PGSQL drivers, I disabled it in Kettle.
I will enable it later tonight and also upgrade the driver to 8.1.

In the mean time, download the source code for 2.2.2, change the PGSQL DB meta java code and re-compile using Ant (or in Eclipse).

Or just wait until tomorrow, then the daily development build will be available ;-)

Cheers,
Matt

steinarb
03-20-2006, 05:58 AM
I tried building Kettle from a subversion checkout inside of eclipse
(using subclipse), and changed the method
PostgreSQLDatabaseMeta.isFetchSizeSupported() to return true, removed
the pg74 driver, and added the 8.1 driver.

I still get an out-of-memory exception when doing a preview of the
first 100 rows of the idisc table from the Explore dialog.

I guess I&#39;ll wait for tomorrow&#39;s daily development build...:-)

- Steinar

MattCasters
03-20-2006, 06:02 AM
Either that or the PGSQL driver doesn&#39;t consider the hint (which I doubt)
Not a good sign though :-(

jbleuel
03-20-2006, 07:17 AM
Below is a hint for debugging what the driver is actually doing.

[http://archives.postgresql.org/pgsql-jdbc/2005-02/msg00198.php]

On Fri, 25 Feb 2005, Alan Stange wrote:

> conn.setAutoCommit(false);
> st.setFetchDirection(ResultSet.FETCH_FORWARD);
> st.setFetchSize(1000);
> st.executeQuery("select * from foo");
>
> which would result in queries to the postgresql server like this:
>
> DECLARE JDBC_CURS_588 CURSOR FOR select * from foo; fetch forward 1000
> from JDBC_CURS_588;
>
> In the 8.0 driver I don&#39;t see this happening. I see only the "select *
> from foo" query in the postgresql server log.
>
> Is the query result still being sent in batch chunks of 1000 rows, but
> using a different mechanism?
>

Yes, it is still being sent in batches, but it uses the V3 protocol
ability to partially execute portals. An Execute message takes a
maximum number of rows to return parameter, which is the fetch size.
Debugging what the driver is actually doing can be done by adding
?loglevel=2 to the URL which will log all the frontend/backend messages
sent/received.

Kris Jurka

jbleuel
03-20-2006, 07:28 AM
Just some other thinks to mention:

- Do you use also the 8.x Version on server side?

- The result set must be stored temporarily somewhere at the
client or server side when fetchSize is set (hold cursor seems to
be used). Is this the same machine? May be there is a
mechanism at server side if the memory for server is not enough
or anything else, the data canÂÂ't be stored to temp so the fetchSize
is not possible to set....

HTH,

Jens

kettle_anonymous
03-20-2006, 09:29 AM
I&#39;ve tried debugging into PostgreSQLDatabaseMeta.isFetchSizeSupported()
and then browse the idisc table.

But I never seem to reach the breakpoint.

I&#39;m running the debugger on a kettle built from the subversion
checkout, inside eclipse 3.0.1 on a blackdown Java 1.4 VM.

- Steinar

steinarb
03-20-2006, 09:33 AM
Jens writes:
> - Do you use also the 8.x Version on server side?

Yes. PostgreSQL 8.1.

>- The result set must be stored temporarily somewhere at the
> client or server side when fetchSize is set (hold cursor seems to be
> used). Is this the same machine?

Yes. Both server and client is running on my laptop, which is a linux
machine (2.6.12) with 2GB of RAM and 5GB of swap space.

> May be there is a mechanism at server side if the memory for server
> is not enough or anything else, the data canÂÂ't be stored to temp so
> the fetchSize is not possible to set....

I don&#39;t know. Maybe...?

- Steinar

steinarb
03-21-2006, 12:32 AM
Matt writes:
> Either that or the PGSQL driver doesn&#39;t consider the hint (which I doubt)
> Not a good sign though :-(

I debugged some more today. This time I successfully hit the
breakpoint in PostgreSQLDatabaseMeta.isFetchSizeSupported(). This
function returns true. But the expression
(databaseMeta.isFetchSizeSupported() && sel_stmt.getMaxRows()>0)
in Database.openQuery() is false, because sel_stmt.getMaxRows() is 0.

The first things it does after the block followed by the if test, is
to setMaxRows() to the value of rowlimit.

Perhaps you should test against rowlimit being larger than 0, instead
of sel_stmt.getMaxRows()?


The operations I did in spoon to get to the breakpoint, was:
- Open "Connections" in the treeview
- RMBM on the connection "freedb" and select "Explore..."
- In the "Database Explorer on [freedb]" dialog, do RMBM on idisc
- Select "Preview first 100 rows of [idisc]"

- Steinar

steinarb
03-21-2006, 12:34 AM
> Perhaps you should test against rowlimit being larger than 0,
> instead of sel_stmt.getMaxRows()?

Nah... rowlimit is 0, as well.

- Steinar

steinarb
03-21-2006, 12:51 AM
However the 2.3.0 version that Matt built today, worked just fine. At least previewing did.

I guess I haven&#39;t understood subversion updates in subclipse yet...

Now I just have to see if the transform works.

Thanx for the good work, Matt!

- Steinar

MattCasters
03-21-2006, 12:54 AM
Yesterday evening I had some fun with it.
The Java JDBC contract says that for setFetchSize(size) : 0 < size < getMaxRows()
That&#39;s why the limit is in there. Some JDBC driver bork on it if you remove it.
Anyway, in 2.3.0 I specifically blocked out this requirement for PGSQL.
The default fetch size for Kettle is 100 but setting it makes no difference.

It&#39;s a sad story, I wish I had a quick answer.
Added a limit clause for the preview though.

Matt

MattCasters
03-21-2006, 01:00 AM
Thanks but no, for me it doesn't work yet. (PGSQL 8.0 database)

We'll have to keep looking or move to MySQL (http://dev.mysql.com/tech-resources/articles/performance-partitioning.html) ;-)




All the best,

Matt

steinarb
03-21-2006, 02:59 AM
Last input from me is that my transform (attached) still crashed out
with out of memory on the input. So I did "Edit step" on the input
step and noticed that "Limit size" was 0. I thought that this may be
the "chunking size" so I changed it to 5000.

And that made the transform run successfully. But it only took the
first 5000 lines.

So I guess you&#39;re right in that it still doesn&#39;t work. At least the
automatic sizing into 5000 records doesn&#39;t work.

The _explicit_ sizing works now (eg. in table exploration ). It
didn&#39;t work yesterday.

- Steinar

MattCasters
03-21-2006, 03:06 AM
OK, here is a possible workaround:

SELECT * FROM idisc LIMIT 1000 OFFSET ?

Use tableInput with Insert Data from Step.
Generate a Sequence before that:
- RowGenerator: 5000 empty rows
- Generate sequence offset, start 0, increment 1000

Specify "Execute for each row"

Perhaps this will chop up the thing in smaller pieces...

Cheers,
Matt

steinarb
03-21-2006, 03:17 AM
Matt writes:
> The Java JDBC contract says that for setFetchSize(size) : 0 < size <
> getMaxRows() That&#39;s why the limit is in there. Some JDBC driver bork
> on it if you remove it.

You mean that the test was there to ensure that you don&#39;t set a
setFetchSize() that is larger than getMaxRows()?

> Anyway, in 2.3.0 I specifically blocked out this requirement for PGSQL.

In the if test I was debugging in earlier today?

> The default fetch size for Kettle is 100 but setting it makes no
> difference.

Hm... when I debugged, the value of pstmt.getMaxRows() was 0. Should
it really have been 1965361? Perhaps the problem is in the PGSQL JDBC
getMaxRows()?

What does the "Show size of [idisc]" RMBM entry in the database
explorer use, because that outputs:
"Table [idisc] has 1965361 rows."

- Steinar

steinarb
03-21-2006, 03:25 AM
I meant sel_stmt.getMaxRows(), not pstmt.getMaxRows() (presumably because
params was 0 on the call to openQuery()...?)

MattCasters
03-21-2006, 04:23 AM
>You mean that the test was there to ensure that you don&#39;t set a
>setFetchSize() that is larger than getMaxRows()?

Basically, yes. You have no idea of the low quality of certain JDBC drivers :-(((((

> In the if test I was debugging in earlier today?

Yes.

> pstmt.getMaxRows()

This is the maximum amount of rows that can be pre-fetched.
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getMaxRows() :
"Retrieves the maximum number of rows that a ResultSet object produced by this Statement object can contain. If this limit is exceeded, the excess rows are silently dropped. " The fact that it is 0 worries me as it implies that setFetchSize() will not work.

I guess we ought to contact the PGSQL JDBC developers really.

>What does the "Show size of [idisc]" RMBM entry in the database
>explorer use, because that outputs:
>"Table [idisc] has 1965361 rows."

SELECT COUNT(*) FROM idisc

No problem here because it only retrieves a single row. The work is done on the database.

Also, I entered the "LIMIT" clause in the getSQLLimitClause(int nrRows) allowing you to preview the PGSQL tables quickly. It&#39;s no longer an Out-Of-Memory test-case.

Take care,
Matt

steinarb
03-21-2006, 11:50 AM
Matt writes:
>> pstmt.getMaxRows()

> This is the maximum amount of rows that can be pre-fetched.
> http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getMaxRows() :
> "Retrieves the maximum number of rows that a ResultSet object
> produced by this Statement object can contain. If this limit is
> exceeded, the excess rows are silently dropped."
> The fact that it is 0 worries me as it implies that setFetchSize()
> will not work.

According to this link, setFetchSize() is supposed to work:
http://jdbc.postgresql.org/documentation/81/query.html#fetchsize-example

My guess is that PostgreSQL isn&#39;t one of the JDBC implementations that will
break if setFetchSize() is bigger than getMaxRows().

That would explain why they aren&#39;t aware of this issue.

> I guess we ought to contact the PGSQL JDBC developers really.

It might be worth checking out the 8.2devel driver to see if it has
any improvements...?

However, I did a quick search for getMaxRows() in their mailing list:
http://dir.gmane.org/gmane.comp.db.postgresql.jdbc
The last mention of getMaxRows() is from 2003 which would seem to
indicate that nobody reported problems about it.

I guess I could post something, and link to this thread? Or would you
prefer to post something yourself?

> SELECT COUNT(*) FROM idisc

> No problem here because it only retrieves a single row. The work is
> done on the database.

Here&#39;s a possible workaround:
- Add a getMaxRows() function to DataBaseInterface(), that takes a
PreparedStatement argument and returns an int
- In BaseDatabaseMeta, implement this function to just return
getMaxRows() on the argument
- In PostgreSQLDatabaseMeta replace it to first call the
BaseDatabaseMeta.getMaxRows(), and if the return value of that call
is null, do a query with "select count(*) ..."

Of course that would be tricky if the select is something more complex
than a select from a single table.

The best idea is to get the PGSQL driver fixed.

- Steinar

steinarb
03-21-2006, 11:58 AM
Matt writes:

>> This is the maximum amount of rows that can be pre-fetched.
>> http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getMaxRows() :
>> "Retrieves the maximum number of rows that a ResultSet object
>> produced by this Statement object can contain. If this limit is
>> exceeded, the excess rows are silently dropped."

> The fact that it is 0 worries me as it implies that setFetchSize()
> will not work.

Hm... here&#39;s more text from the above URL (getMaxRows() documentation):
"Returns:
the current maximum number of rows for a ResultSet object produced
by this Statement object; zero means there is no limit"

I take that to mean that if the return of getMaxRows() is 0, the JDBC
driver in question will not have a problem if you set an arbitrary
setFetchSize().

- Steinar

steinarb
03-21-2006, 12:27 PM
Attachment: Database.java (http://forums.pentaho.org/archived_att/kettle/Database.java) I tried rewriting the Database.java class to just setFetchSize() to
Const.FETCH_SIZE if getMaxRows() == 0.

But that didn&#39;t make any difference. It still crashes out.

My rewritten file is attached.

- Steinar

MattCasters
03-21-2006, 01:07 PM
Yeah, like I said, I hacked this in yesterday evening...
I say this again: setFetchSize() isn&#39;t working in the MySQL 8.1 driver (or database)

Did you try the workaround I suggested?
Don&#39;t have the time to try it myself...

Matt

steinarb
03-21-2006, 11:39 PM
Matt writes:
> I say this again: setFetchSize() isn&#39;t working in the MySQL 8.1
> driver (or database)

Here&#39;s someone that had a problem similar to this on January 5 this
year:
http://comments.gmane.org/gmane.comp.db.postgresql.jdbc/10716

In the solution given in that thread, setAutoCommit(false) is called
on the connection before calling createStatement().

Perhaps that&#39;s the magic invocation?

> Did you try the workaround I suggested?
> Don&#39;t have the time to try it myself...

No, I missed that. I found it now, and will try it later today.

Thanx!

- Steinar

steinarb
03-22-2006, 01:26 AM
Attachment: kettle_repo.ktr (http://forums.pentaho.org/archived_att/kettle/kettle_repo.ktr) It worked, at least for my simple transform (see attached xml file).
One issue is that it&#39;ll keep running until the sequence generator
reaches its upper limit (I guess. I stopped it before it completed).

Now I&#39;ll attempt to do some aggregation, using kettle operations
rather than SQL queries.

Thanx!


- Steinar

MattCasters
03-22-2006, 01:36 AM
Attachment: workaround2.ktr (http://forums.pentaho.org/archived_att/kettle/workaround2.ktr) OK, how about something like the workaround in attachement?