Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Hadoop integration in Pentaho CE

  1. #1
    Join Date
    Sep 2009
    Posts
    10

    Post Hadoop integration in Pentaho CE

    Hi,

    I am working on to establish communication with Hadoop/Hive data warehouse in Pentaho BI. Can some one guide me on how can that be achieved?

    I tried to define connection properties in jdbc.properties file and create a cube specific to my hive data database. but it throws error that the schema name is not bound

    12:53:01,941 ERROR [Logger] Error: Pentaho
    12:53:01,957 ERROR [Logger] misc-org.pentaho.platform.plugin.services.connections.mondrian.MDXConnection: MDXConnection.ERROR_0002 - Invalid connecti
    n properties: PoolNeeded=false; dataSource=pyramid; Provider=mondrian; Catalog=solution:steel-wheels/analysis/pyramid.mondrian.xml; DynamicSchemaProc
    ssor=mondrian.i18n.LocalizingDynamicSchemaProcessor; Locale=en_US
    org.pentaho.platform.api.data.DatasourceServiceException: javax.naming.NameNotFoundException: Name pyramid is not bound in this Context


    Any body who can direct me in this regards would be a great help for me.

    thanks,
    Ashok Riwaria

  2. #2

    Default

    To my knowledge you can't connect to Hadoop and Hive with the CE of PDI. Not sure about the CE's of the other modules, but it would suprise me if it would be any different there..

  3. #3
    Join Date
    Sep 2009
    Posts
    10

    Default

    Is it possible to use Hadoop/Hive in analysis view and analyzer in the Enterprise edition? if yes, can some one assist me with the same?

    Ashok

  4. #4

    Default

    There has been some great work done by the community to identify, and in some cases work around, areas where Mondrian's SQL generation hits unsupported areas in Hive (examples http://jira.pentaho.com/browse/MONDRIAN-789 and http://jira.pentaho.com/browse/PDI-4355). That said, we do not officially support the use of Mondrian on top of Hive at this point. Beyond the known areas of incompatibility, it is probably not a great idea at this time anyway. Mondrian and Analyzer are designed to let users freely explore data, all the while issuing SQL queries under the covers. The latency of Hive queries at this point are not an ideal match for this use case. We would recommend using PDI/AgileBI to stage subsets of the data from Hadoop (even extracted in a summarized fashion using a Hive query) to build Analysis Cubes from. PDI EE provides a simple way to set this up and even schedule the extracts on a recurring basis.

    hth, jake

  5. #5
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    I'll translate what jake said:

    IF YOU TRY IT YOU'LL DIE OF OLD AGE AFTER 2 ANALYSIS
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  6. #6

    Default

    Thanks for the translation... clearly I've been in management for too long

  7. #7
    Join Date
    Sep 2009
    Posts
    10

    Default

    Thanks Jake/Pedro

    So can you please suggest me some alternate path or solution? or is it targetted in some future releases of Pentaho?

    Ashok

  8. #8
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    It's not about pentaho, it's about hive.

    I know nick goodman from LucidDb / DynamoBI is has a project that puts lucid in front of hadoop for fast analysis, try to get info there
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  9. #9

    Default

    Pedro's suggestions is a good one. Otherwise, I would just use PDI to query a slice of data out of Hadoop via Hive, then load it into an RDBMs like LucidDB/MySQL/etc., then build your cubes/dashboards on that. PDI EE can help you schedule the extract/load if you want to periodically refresh the data.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.