Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Hive Connection on Windows - Configuration of Data Source with JDBC driver

  1. #1
    Join Date
    Aug 2010
    Posts
    5

    Default Hive Connection on Windows - Configuration of Data Source with JDBC driver

    Hello,

    I am new to Pentaho. And I want to connect Pentaho to our Hadoop/Hive environment. I am using a Windows client. Can you please guide me on how to configure the connection?

    We have our own JDBC driver to connect to Hive, which I want to use. (This driver works fine when used with SQuirrel SQL Client). I found Linux configuration on this site but there are no instructions for Windows:
    http://wiki.apache.org/hadoop/Hive/HiveJDBCInterface
    Do you know how to do this on Windows environment?

    I would like to use all reporting and dashboard capabilities of Pentaho with our own data in the Hadoop/Hive environment.

    Thanks in advance,

    - firat gelbal

  2. #2

    Default

    Hello Firat,

    Thanks for posting. One of our presales engineers did a blog and video on how to define a connection in the BI Server that may help:

    http://blog.pentaho.com/2010/07/23/p...s-even-easier/

    If you want to use Pentaho Report Designer, you will need to:
    1) drop your JDBC driver into <install-dir>\report-designer\lib\jdbc before starting Report Designer
    2) when defining your connection, use the 'Generic database' option and manually provide your connection URL and the jdbc driver class name

    Also, I'm pleased to share that we are wrapping up the first set of our own contributions to the HIVE JDBC driver to make it more compatible with the Pentaho BI Suite. As part of this, we have also added Hive as a selectable option in the connection dialog. Later this week I will be posting our cut of the driver with an updated drop of PDI so that you can test the integration with the common Connections dialog. You can follow the progress of the beta program, access downloads, etc by going to:

    http://wiki.pentaho.com/display/EAI/...op+Integration

    I'm very interested in getting your feedback on any problems you run into using Hive and the Hive JDBC driver with Pentaho as we are actively contributing to the Apache project to make the driver more robust.

    Thanks!
    Jake

  3. #3
    Join Date
    Aug 2010
    Posts
    5

    Default Connection is fine but cannot retrieve Metadata. How to run Pentaho with our data?

    Hello Jake,

    Thanks for the prompt response. The blog link was indeed helpful. I setup the connection, Pentaho can communicate with our Hive server now.

    However, I cannot do anything now as Pentaho's Metadata Editor fails to get the metadata from our Hive server. It clearly displays the table names but it fails while 'importing tables'. Here is the error message:

    org.pentaho.di.core.exception.KettleDatabaseException:
    Couldn't get field info from [SELECT * FROM calendar]

    Method not supported

    at org.pentaho.di.core.database.Database.getQueryFieldsFallback(Database.java:2315)
    at org.pentaho.di.core.database.Database.getQueryFields(Database.java:2251)
    at org.pentaho.di.core.database.Database.getQueryFields(Database.java:1928)
    at org.pentaho.di.core.database.Database.getTableFields(Database.java:1923)
    at org.pentaho.pms.ui.MetaEditor.importTableDefinition(MetaEditor.java:3384)
    at org.pentaho.pms.ui.MetaEditor.importMultipleTables(MetaEditor.java:3334)
    at org.pentaho.pms.ui.MetaEditor.newConnection(MetaEditor.java:2799)
    at org.pentaho.pms.ui.MetaEditor$6.handleEvent(MetaEditor.java:490)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.pms.ui.MetaEditor.readAndDispatch(MetaEditor.java:851)
    at org.pentaho.pms.ui.MetaEditor.main(MetaEditor.java:3589)
    Caused by: java.sql.SQLException: Method not supported
    at org.apache.hadoop.hive.jdbc.HivePreparedStatement.getMetaData(HivePreparedStatement.java:132)
    at org.pentaho.di.core.database.Database.getQueryFieldsFallback(Database.java:2308)
    ... 13 more



    Perhaps it's better if I wait till you wrap up your Hive JDBC driver. Can you please update me when the new driver is ready with 'how to install' guidelines?

    Finally, can you advise me on how to start a project in Pentaho? I checked the knowledge base but the articles I found mainly involves using the demo data. What I want is to use our own data by retrieving it from our Hive servers. I assume the first step should be implementing metadata but as I said, Metadata editor gives error while import and I cannot use Schema Workbench as there is no metadata to work with. I will be happy to get advise on how to evaluate Pentaho with our own data.

    Thanks and best,

    - firat

  4. #4

    Default

    Thanks for the feedback Firat. We just ended our first Sprint on updating the Hive JDBC driver to be more compatible with Pentaho. I will be posting a copy of the driver later this week with guidelines on what works and what doesn't. Currently, the biggest support gap that we're working through is the lack of support in Hive for prepared statements. Look for more info from me in the next couple of days on the availability of the updated driver.

    -Jake

  5. #5
    Join Date
    Aug 2010
    Posts
    5

    Default

    Thanks for the support Jake. Our Senior Database Developer has made a quick fix for this. So Metadata Editor was able to get data (I don't how ). But there are now 2 new major issues:

    1- The Metadata Editor does not do the proper Inner Join: Hive prefers the 'join's to be in FROM clause while Metadata Editor does them in WHERE Clause. This is strange as the generated SQLs for the outer joins are all in the FROM clause.

    I see that this was reported earlier in the discussion forums but no replies. I posted it again:

    http://forums.pentaho.com/showthread...-Join&p=242483

    2- Our fact table is partitioned on a date field (which is basically calendar_id that represents a day in the calendar table). However, because of configuration in Hive, this date field is not displayed as part of the schema so Metadata Editor does not see it.
    I am not sure if this an issue yet, I tried a workaround but I cannot test it because of 1st issue.

    I will be happy to get as much as help as I can for these issues..

    Good luck and success..

    - firat

  6. #6
    Join Date
    Aug 2010
    Posts
    87

    Default

    Hello Firat,

    Quote Originally Posted by fgelbal View Post
    1- The Metadata Editor does not do the proper Inner Join: Hive prefers the 'join's to be in FROM clause while Metadata Editor does them in WHERE Clause. This is strange as the generated SQLs for the outer joins are all in the FROM clause.
    Great feedback. We've just started our next sprint focusing partly on hardening the Hive driver. We are in the process of integrating the Hive-specific JOIN syntax into Metadata Editor to fix this. You can follow the progress here: http://jira.pentaho.com/browse/PDI-4452.

    Quote Originally Posted by fgelbal View Post
    2- Our fact table is partitioned on a date field (which is basically calendar_id that represents a day in the calendar table). However, because of configuration in Hive, this date field is not displayed as part of the schema so Metadata Editor does not see it.
    I am not sure if this an issue yet, I tried a workaround but I cannot test it because of 1st issue.
    What is your workaround? How to best show the partition columns is something we're still thinking about.

    -Jordan

  7. #7
    Join Date
    Aug 2010
    Posts
    5

    Default

    Hello Jordan,

    Thanks for the prompt response and, more importantly, for the action to solve these issues. Please let me know when the new driver is available. For now, I cannot use Pentaho unfortunately.

    As for the workaround of the partitioning: In the Metadata Editor, I've just added a new column to the fact table and in the 'Calculation - > Formula' part of the Column Setttings, I simply put the physical name of the partition column.
    So for me, if you just display the partition column as part of the table, it's already good. Or you might put it directly under 'Connection' (besides the tables), with a different logo perhaps.?

    Good luck and success with the work..

    - firat

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.