PDA

View Full Version : Many transofrmation gadgets do not work properly in Pentaho Hadoop!!



afancy
01-13-2011, 08:36 AM
Hi,

I am trying to make transformations for Pentaho Hadoop to load data into database. However, I found that it is very bugy for many transformation gadgets. I list some as followings :

1) Formula gadget throw in PDI in Linux
"Unable to open dialog for this step
org.eclipse.swt.SWTError: XPCOM error -2147024882
org.eclipse.swt.browser.Mozilla.error(Unknown Source)
org.eclipse.swt.browser.Mozilla.setText(Unknown Source)
org.eclipse.swt.browser.Browser.setText(Unknown Source)
org.pentaho.libformula.ui.editor.LibFormulaEditor.setStyles(LibFormulaEditor.java:309)
org.pentaho.libformula.ui.editor.LibFormulaEditor.<init>(LibFormulaEditor.java:217)
org.pentaho.di.ui.trans.steps.formula.FormulaDialog$3.widgetSelected(FormulaDialog.java:218)"

2) If the a mapper or reducer transformation contains "Table Output", when i run in hadoop, for every rows it will open and close the connections even i have already enabled the Pool. Followings are the postgresql Log:

2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: connection received: host=127.0.0.1 port=42399
2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: connection received: host=127.0.0.1 port=42400
2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: connection received: host=127.0.0.1 port=42401
2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: connection received: host=127.0.0.1 port=42402
2011-01-13 13:20:05 CET LOG: connection received: host=127.0.0.1 port=42403
2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: connection received: host=127.0.0.1 port=42404
2011-01-13 13:20:05 CET LOG: connection authorized: user=xiliu database=xiliu
2011-01-13 13:20:05 CET LOG: disconnection: session time: 0:00:00.557 user=xiliu database=xiliu host=127.0.0.1 port=42390
2011-01-13 13:20:05 CET LOG: disconnection: session time: 0:00:00.559 user=xiliu database=xiliu host=127.0.0.1 port=42392
2011-01-13 13:20:05 CET LOG: disconnection: session time: 0:00:00.555 user=xiliu database=xiliu host=127.0.0.1 port=42393
2011-01-13 13:20:08 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:08 CET LOG: disconnection: session time: 0:00:02.925 user=xiliu database=xiliu host=127.0.0.1 port=42401
2011-01-13 13:20:08 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:08 CET LOG: disconnection: session time: 0:00:02.913 user=xiliu database=xiliu host=127.0.0.1 port=42403
2011-01-13 13:20:08 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:08 CET LOG: disconnection: session time: 0:00:02.942 user=xiliu database=xiliu host=127.0.0.1 port=42399
2011-01-13 13:20:45 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:45 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:45 CET LOG: disconnection: session time: 0:00:40.226 user=xiliu database=xiliu host=127.0.0.1 port=42397
2011-01-13 13:20:45 CET LOG: disconnection: session time: 0:00:40.227 user=xiliu database=xiliu host=127.0.0.1 port=42398
2011-01-13 13:20:45 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:45 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:45 CET LOG: disconnection: session time: 0:00:40.170 user=xiliu database=xiliu host=127.0.0.1 port=42400
2011-01-13 13:20:45 CET LOG: disconnection: session time: 0:00:40.160 user=xiliu database=xiliu host=127.0.0.1 port=42402
2011-01-13 13:20:45 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:45 CET LOG: disconnection: session time: 0:00:40.139 user=xiliu database=xiliu host=127.0.0.1 port=42404
2011-01-13 13:20:46 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:46 CET LOG: disconnection: session time: 0:00:41.083 user=xiliu database=xiliu host=127.0.0.1 port=42388
2011-01-13 13:20:46 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:46 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:46 CET LOG: disconnection: session time: 0:00:41.040 user=xiliu database=xiliu host=127.0.0.1 port=42391
2011-01-13 13:20:46 CET LOG: disconnection: session time: 0:00:41.081 user=xiliu database=xiliu host=127.0.0.1 port=42389
2011-01-13 13:20:46 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:46 CET LOG: disconnection: session time: 0:00:41.031 user=xiliu database=xiliu host=127.0.0.1 port=42394
2011-01-13 13:20:46 CET LOG: unexpected EOF on client connection
2011-01-13 13:20:46 CET LOG: disconnection: session time: 0:00:41.023 user=xiliu database=xiliu host=127.0.0.1 port=42395
2011-01-13 13:21:01 CET LOG: connection received: host=127.0.0.1 port=42434
2011-01-13 13:21:01 CET LOG: connection received: host=127.0.0.1 port=42435
....

I guess the database connection instance is created in the map(key, value) or reduce (key, value) function such that for every row, it new a connection, and when out of the scope of the fuction the connection was closed.

Could you give me the suggestion? thanks!

afancy
01-21-2011, 10:21 AM
Hi,

Have you found the problem in my previous post?

Suggestions?

thanks

jganoff
01-21-2011, 10:41 AM
Sorry for the delayed reply!



1) Formula gadget throw in PDI in Linux
"Unable to open dialog for this step
org.eclipse.swt.SWTError: XPCOM error -2147024882
org.eclipse.swt.browser.Mozilla.error(Unknown Source)
org.eclipse.swt.browser.Mozilla.setText(Unknown Source)
org.eclipse.swt.browser.Browser.setText(Unknown Source)
org.pentaho.libformula.ui.editor.LibFormulaEditor.setStyles(LibFormulaEditor.java:309)
org.pentaho.libformula.ui.editor.LibFormulaEditor.<init>(LibFormulaEditor.java:217)
org.pentaho.di.ui.trans.steps.formula.FormulaDialog$3.widgetSelected(FormulaDialog.java:218)"


Are you seeing this in Spoon? If so it may be related to a known issue that was supposed to be fixed: PDI-2819 (http://jira.pentaho.com/browse/PDI-2819). I recommend double checking the spoon.sh script and make sure the MOZILLA_FIVE_HOME variable is pointing to your xulrunner installation path.

Could you provide a reproduction path so we can file a bug report?



2) If the a mapper or reducer transformation contains "Table Output", when i run in hadoop, for every rows it will open and close the connections even i have already enabled the Pool.
....


This is a known issue that's being addressed in an upcoming release. There are major performance improvements that are being included for the mapping portion of the job. The reducers (and combiners) will exhibit the same behavior since their nature is to be started/stopped per input set.

Hope this helps!

- Jordan