PDA

View Full Version : OutOfMemoryError creating PDI connection to Hive with CDH4



hodgesz
09-15-2012, 04:00 PM
First we followed the steps in http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+CDH4 to replace the Hadoop related jars with the CDH4 jars.

We then attempted the Reporting on Hive Data example (http://wiki.pentaho.com/display/BAD/Reporting+on+Hive+Data) we get the following OutOfMemoryError exception in Report Designer while trying to connect to Hive via the Thrift server interface. We have Hive thrift server running via 'hive --service hiveserver' and verified it is running on port 10000 with telnet.


Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
Error connecting to database: (using class org.apache.hadoop.hive.jdbc.HiveDriver)
Java heap space

at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:508)
at org.pentaho.di.core.database.Database.normalConnect(Database.java:352)
... 115 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:105)
at org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:92)
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:187)
at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:127)
at org.apache.hadoop.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:123)
at org.apache.hadoop.hive.jdbc.HiveConnection.<init>(HiveConnection.java:118)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:207)
at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:490)
at org.pentaho.di.core.database.Database.normalConnect(Database.java:352)
at org.pentaho.di.core.database.Database.connect(Database.java:317)
at org.pentaho.di.core.database.Database.connect(Database.java:279)
at org.pentaho.di.core.database.Database.connect(Database.java:269)
at org.pentaho.di.core.database.DatabaseFactory.getConnectionTestReport(DatabaseFactory.java:86)
at org.pentaho.di.core.database.DatabaseMeta.testConnection(DatabaseMeta.java:2464)
at org.pentaho.ui.database.event.DataHandler.testDatabaseConnection(DataHandler.java:533)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:329)
at org.pentaho.ui.xul.swing.tags.SwingButton$OnClickRunnable.run(SwingButton.java:58)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:209)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:597)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:178)
at java.awt.Dialog$1.run(Dialog.java:1046)


We tried doubling the heap to 1024mb with the PENTAHO_DI_JAVA_OPTIONS as well as doubling HADOOP_HEAPSIZE to 2000mb in hadoop-env.sh. Neither change helped the issue though.

Using the steps found in http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions we had no problem connecting to Hive with HDP 1.1. Since CDH4 is using HDFS 2.0 and MR2 we thought this might be the area causing issues.

Any ideas?

Thanks,

Jonathan