Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Thrift Connection Support in PDI's HBase Output Step

  1. #1

    Default Thrift Connection Support in PDI's HBase Output Step

    HBase Output Step has support for Zookeeper connection only, so how to load data into remote HBase via Thrift gateway?

    Thanks,
    Sujen

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Sujen,

    There is no support for talking to a Thrift gateway in the HBase output step at present. Hunting around on the web at the time HBase in/out was coded suggested that there was a performance hit to be had by having thrift as an intermediate layer, so the steps use the HBase API directly.

    Cheers,
    Mark.

  3. #3

    Default

    Hello Mark,

    Thanks for your response. It makes sense.

    But my HBase Server is running on Amazon EC2 instance, and as such, the client(PDI in our case) is not able to connect to it via zookeeper due to some address resolution issues(I have come across the patches to solve the issue, but not interested on these workarounds, so my plan is to access HBase server on EC2 via Thrift). Zookeeper connection works, if the client is within the Amazon's EC2 internal network. But, I would like to load data into remote HBase server on EC2 from my local network, which I am not being able to get done at the moment. For time being, I am getting things done by running my own Java based program. So please let me know if you have any suggestion on this.

    Besides, I have another concern to get cleared.
    I have noticed, while we create mappings in the HBase Output step, PDI creates a table named "pentaho_mappings" in HBase Server, which I suppose, PDI uses, to load the mapping informations.
    I created a transformation that loads data from some other datasources into HBase in local machine using HBase Output Step ( Output Step henced saved, with localhost as Zookeeper Quorum). Now if I need to schedule the same transformation into EC2 machine with data integration and HBase servers pre-installed, PDI throws "pentaho_mappings not found" exceptions, which is quite expected, as there is no mappings table in HBase server on my EC2. So, please share your comment on this and let me know, if I am missing anything.

    Thanks,
    Sujen

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi Sujen,

    I'm afraid I don't have any useful suggestions for you regarding the Amazon/Thrift issue at this stage. Regarding the mappings in HBase output you are correct in your conclusion - the mapping editor in the step stores the mapping in the "pentaho_mappings" table at creation time using the currently configured connection. This functionality is encapsulated in a class that is shared by both the HBase input and output steps. I guess I'm more of a data mining than ETL guy, so your particular use case hadn't occurred to me :-) What would be needed is the ability to store a mapping definition in the step's XML configuration and have an option to use this as a default if the named mapping doesn't exist in the mapping table at execution time. Feel free to open an enhancement JIRA for this.

    Cheers,
    Mark.

  5. #5

    Default

    Hello Mark,

    Thanks for sharing, very much appreciated.
    Yes, not sure abt the exact reason behind the idea of storing mapping definition in HBase. Seems like , need to open an enhancement in JIRA as you just suggested.

    Thanks Again.
    Sujen

  6. #6
    Join Date
    Nov 1999
    Posts
    459

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.