Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Amazon Athena JDBC Connection in PDI

  1. #1
    Join Date
    Sep 2010
    Posts
    8

    Default Amazon Athena JDBC Connection in PDI

    I am unable to connect to Amazon Athena with a generic database JDBC connection from Pentaho Kettle PDI. Can anyone else connect?

    Amazon has a new S3/Hive based DB platform called Amazon Athena (http://docs.aws.amazon.com/athena/la...with-jdbc.html)
    I downloaded the JDBC jar (AthenaJDBC41-1.0.0.jar) and placed it in my /lib dir for Kettle, Pentaho PDI 7.0.

    Connection parameters:
    url: jdbc:awsathena://athena.us-west-2.amazonaws.com:443
    class:
    com.amazonaws.athena.jdbc.AthenaDriver
    username: ${aws.access_key_id}
    password: ${aws.client_secret}
    option/parameter:
    s3_staging_dir = s3://greatvines-athena/vip (the location in s3 where the data files are stored)

    I have an Amazon AWS account with appropriate access, a client ID (username), and client secret (password); as well as the universal AWS CLI tool installed. I have the S3 folders and Athena database configured. I can run SQL queries against the data files when via the AWS Athena console.

    Additionally, I am able to connect and run SQL queries using a JDBC connection on other clients (Squirrel, RazorSQL, etc.), but not with Pentaho Kettle. I tried this in both 6.0 and 7.0 versions of PDI.


    Error log when attempting to Test the connection:

    Error connecting to database [Athena] rg.pentaho.di.core.exception.KettleDatabaseException:
    Error occurred while trying to connect to the database


    Error connecting to database: (using class com.amazonaws.athena.jdbc.AthenaDriver)
    No host specified: jdbc:awsathena://athena.us-west-2.amazonaws.com:443;s3_staging_dir=s3://greatvines-athena/vip




    org.pentaho.di.core.exception.KettleDatabaseException:
    Error occurred while trying to connect to the database


    Error connecting to database: (using class com.amazonaws.athena.jdbc.AthenaDriver)
    No host specified: jdbc:awsathena://athena.us-west-2.amazonaws.com:443;s3_staging_dir=s3://greatvines-athena/vip




    at org.pentaho.di.core.database.Database.normalConnect(Database.java:472)
    at org.pentaho.di.core.database.Database.connect(Database.java:370)
    at org.pentaho.di.core.database.Database.connect(Database.java:341)
    at org.pentaho.di.core.database.Database.connect(Database.java:331)
    at org.pentaho.di.core.database.DatabaseFactory.getConnectionTestReport(DatabaseFactory.java:80)
    at org.pentaho.di.core.database.DatabaseMeta.testConnection(DatabaseMeta.java:2795)
    at org.pentaho.ui.database.event.DataHandler.testDatabaseConnection(DataHandler.java:598)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:313)
    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:157)
    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:141)
    at org.pentaho.ui.xul.swt.tags.SwtButton.access$500(SwtButton.java:43)
    at org.pentaho.ui.xul.swt.tags.SwtButton$4.widgetSelected(SwtButton.java:137)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.eclipse.jface.window.Window.runEventLoop(Window.java:820)
    at org.eclipse.jface.window.Window.open(Window.java:796)
    at org.pentaho.di.ui.xul.KettleDialog.show(KettleDialog.java:80)
    at org.pentaho.di.ui.xul.KettleDialog.show(KettleDialog.java:47)
    at org.pentaho.di.ui.core.database.dialog.XulDatabaseDialog.open(XulDatabaseDialog.java:116)
    at org.pentaho.di.ui.core.database.dialog.DatabaseDialog.open(DatabaseDialog.java:60)
    at org.pentaho.di.ui.spoon.delegates.SpoonDBDelegate.editConnection(SpoonDBDelegate.java:90)
    at org.pentaho.di.ui.spoon.Spoon.doubleClickedInTree(Spoon.java:3102)
    at org.pentaho.di.ui.spoon.Spoon.doubleClickedInTree(Spoon.java:3036)
    at org.pentaho.di.ui.spoon.Spoon.access$2200(Spoon.java:361)
    at org.pentaho.di.ui.spoon.Spoon$26.widgetDefaultSelected(Spoon.java:6169)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1359)
    at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7990)
    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9290)
    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:685)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
    Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
    Error connecting to database: (using class com.amazonaws.athena.jdbc.AthenaDriver)
    No host specified: jdbc:awsathena://athena.us-west-2.amazonaws.com:443;s3_staging_dir=s3://greatvines-athena/vip


    at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:585)
    at org.pentaho.di.core.database.Database.normalConnect(Database.java:456)
    ... 47 more
    Caused by: java.sql.SQLException: No host specified: jdbc:awsathena://athena.us-west-2.amazonaws.com:443;s3_staging_dir=s3://greatvines-athena/vip
    at com.amazonaws.athena.jdbc.AthenaDriver.parseDriverUrl(AthenaDriver.java:322)
    at com.amazonaws.athena.jdbc.AthenaDriver.connect(AthenaDriver.java:112)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:567)
    ... 48 more


    Custom URL :jdbc:awsathena://athena.us-west-2.amazonaws.com:443
    Custom Driver Class:com.amazonaws.athena.jdbc.AthenaDriver

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    The GenericDatabaseMeta class adds driver options to the URL.
    I bet Athena doesn't like that.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Sep 2010
    Posts
    8

    Default

    Thank you for the reply. How would I configure PDI/Kettle so that Athena is one of the standard JDBC selections (and not use Generic Database)? Is there documentation on how to do this?

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Some documentation is available.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Sep 2015
    Posts
    4

    Question same error / problem

    Quote Originally Posted by slivey View Post
    Thank you for the reply. How would I configure PDI/Kettle so that Athena is one of the standard JDBC selections (and not use Generic Database)? Is there documentation on how to do this?
    Slivey, did you ever get Kettle to successfully connect to Athena? If so, do you mind sharing how? Thank you

  6. #6
    Join Date
    Sep 2010
    Posts
    8

    Default

    Yes: I use the latest version of the Athena driver (AthenaJDBC41-1.1.0.jar) placed in the kettle .../data-integration/lib sub directory:
    http://docs.aws.amazon.com/athena/la...with-jdbc.html

    I use the the following in my shared.xml (you will maybe need to change your CUSTOM_URL region, I think):
    <connection>
    <name>Athena</name>
    <server/>
    <type>GENERIC</type>
    <access>Native</access>
    <database/>
    <port></port>
    <username>&#x24;&#x7b;aws.access_key_id&#x7d;</username>
    <password>&#x24;&#x7b;aws.secret_access_key&#x7d;</password>
    <servername/>
    <data_tablespace/>
    <index_tablespace/>
    <attributes>
    <attribute><code>CUSTOM_DRIVER_CLASS</code><attribute>com.amazonaws.athena.jdbc.AthenaDriver</attribute></attribute>
    <attribute><code>CUSTOM_URL</code><attribute>jdbc&#x3A;awsathena&#x3A;&#x2F;&#x2F;athena.us&#x2D;west&#x2D;2.amazonaws.com&#x3A;443&#x3F;s3_staging_dir&#x3D;&#x24;&#x7b;athena.s3_staging_dir&#x7d;</attribute></attribute>
    <attribute><code>FORCE_IDENTIFIERS_TO_LOWERCASE</code><attribute>N</attribute></attribute>
    <attribute><code>FORCE_IDENTIFIERS_TO_UPPERCASE</code><attribute>N</attribute></attribute>
    <attribute><code>IS_CLUSTERED</code><attribute>N</attribute></attribute>
    <attribute><code>PORT_NUMBER</code><attribute></attribute></attribute>
    <attribute><code>QUOTE_ALL_FIELDS</code><attribute>N</attribute></attribute>
    <attribute><code>SUPPORTS_BOOLEAN_DATA_TYPE</code><attribute>N</attribute></attribute>
    <attribute><code>USE_POOLING</code><attribute>N</attribute></attribute>
    </attributes>
    </connection>

    This uses the following kettle parameters:
    ${aws.access_key_id} (from IAM user in AWS console)
    ${aws.secret_access_key} (from IAM user in AWS console)
    ${athena.s3_staging_dir} (value is something like s3://greatvines-athena/results ... this is the S3 bucket/folder that the JDBC connector uses to store/stage query results)

    The AWS IAM User needs access to S3 and Athena. Then you create the client/secret access keys for this user.

  7. #7
    Join Date
    May 2008
    Posts
    10

    Default

    Hi Everyone,

    Thank you to @slivey for giving such clear instructions.
    I have been able to connect to Athena.

    However I am unable to access a table that exists in Athena...
    In Athena, default database
    Name:  Screen Shot 09-08-17 at 09.14 PM.jpg
Views: 1277
Size:  19.0 KB

    In Pentaho Data Integration
    Name:  Screen Shot 09-08-17 at 09.12 PM.jpg
Views: 1295
Size:  46.9 KB

    In Pentaho database explorer, there is "mydatabase" database, and other tables do not appear.
    Name:  Screen Shot 09-08-17 at 08.49 PM.jpg
Views: 1250
Size:  11.7 KB

    In Athena "mydatabase" does not exist and many more tables shows up.
    Name:  Screen Shot 09-08-17 at 08.50 PM.jpg
Views: 1270
Size:  26.9 KB

    Any idea what's going on here?

    Herman

  8. #8
    Join Date
    Apr 2015
    Posts
    13

    Default

    Kinda resurrecting an old thread, but wanted to thank slivey. From his post I learned that the custom connection URL should have both host and staging directory like:

    jdbc:awsathena://athena.us-east-1.amazonaws.com:443?s3_staging_dir=s3://{path to staging directory}

    And then in my query, need to have database prefix in front of table name.

  9. #9
    Join Date
    Apr 2015
    Posts
    13

    Default

    All my PDI jobs that used Athena connection quit working on me today, with error messages like:

    1 validation error detected: Value null at 'outputLocation' failed to satisfy constraint: Member must not be null (Service: AmazonAthena; Status Code: 400; Error Code: InvalidRequestException; Request ID...

    I fixed by installing new Athena JDBC driver on PDI server, AthenaJDBC42_2.0.5.jar. And changing driver class name to com.simba.athena.jdbc.Driver , and changing connection string delimited from ? to ;

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.