Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: How to Implement Twitter "Streaming" API with Oauth 1.1 ?

  1. #1
    Join Date
    Apr 2014
    Posts
    14

    Default How to Implement Twitter "Streaming" API with Oauth 1.1 ?

    Hi all,
    I already asked this question on many blogs in various ways but Didn't get any way or answer for implement this. So here is my basic problem:

    I need a real time Twitter streaming using Pentaho DI aka 'Kettle'. Here I want to emphasize that I want to use "Streaming" API, like
    ===========================
    https://stream.twitter.com/1.1/statuses/sample.json
    or
    https://stream.twitter.com/1.1/statuses/firehose.json
    ===========================
    NOT a searching API (REST API Type), like
    ===========================
    https://api.twitter.com/1.1/search/tweets.json
    ===========================

    when ever I tried to search over the internet for this problem I got some very old solutions which are implemented using Oauth version 1.0, but now day scenario is changed. Now twitter is using Oauth version 1.1 which has different implementation than its older version.

    after searching a lot I got to know that there are three ways to implement twitter APIs in Kettle:
    1) By using an java script and REST Client step in Kettle (It is worked for searching API). (But this searching API never gives real time results.)
    2) By using an java code (including twitter4j libraries) in UDJC step (which will give us a stream). (But Not working, always gives an memory-out-of-bound error. )
    =====================================
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : UnexpectedError:
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : java.lang.OutOfMemoryError: unable to create new native thread
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at java.lang.Thread.start0(Native Method)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at java.lang.Thread.start(Unknown Source)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at twitter4j.TwitterStreamImpl.startHandler(TwitterStreamImpl.java:338)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at twitter4j.TwitterStreamImpl.filter(TwitterStreamImpl.java:284)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at Processor.processRow(Processor.java:112)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.steps.userdefinedjavaclass.UserDefinedJavaClass.processRow(UserDefinedJavaClass.java:1181)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
    2014/10/14 16:48:43 – User Defined Java Class 2.0 – ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at java.lang.Thread.run(Unknown Source)

    =====================================
    3) Here is one more way to implementing above streaming API by using "Shell Script" Job step, and writing cURL script of Twitter Streaming API in it.
    (But it also not working it will give a "401 error code" 'Unauthorized')

    Does any body have any Idea what is wrong going with me in last two type of implementation, or is this impossible to implement streaming API in Kettle.

    Below are some experiences of my own with last two API Implementations.

    First of all in (2) type I thought that I am using wrong Java code in UDJC or it has some error in it, so I had debug it in eclipse, But it worked there in its first run in eclipse, (It didn't asked for Java Heap space there), and I had already given -Xmx1024m heap space in Kettle but still it gives memory-out-of-bound error there. Any body have any clue for this.???

    Now I tried the (3) cURL approach and for testing the cURL script, I executed it manualy in my cmd console (I have windows system.), it executed as it should have. I got the stream. So the basic question is "If I have Perfect code and script, and accurate credential tokens then why they are not working in Kettle???"
    I asked the same on many places but still I don't got any reply or answer, if this is impossible then at least say it...... !!!
    I am attaching here some reference links which helped me. If any body have any Idea, Clue or view regarding this please share.
    =============================
    http://www.patlaf.com/query-twitter-api-with-pentaho-pdi-kettle/
    http://open-bi.blogspot.in/2010/03/s...th-kettle.html
    http://type-exit.org/adventures-with...omment-page-2/
    http://forums.pentaho.com/showthread...72-API-Twitter

    =============================

    Thanks and Regards,
    Rahul Trivedi

  2. #2
    Join Date
    Apr 2014
    Posts
    14

    Default

    Hi All,

    Please help me on this?

  3. #3
    Join Date
    Oct 2011
    Posts
    5

    Default

    Hi Rahul.
    Did you get a solution for your problem?
    I have a similar problem to tackle with.

    Regards,
    Dikesh Shah.

  4. #4
    Join Date
    Sep 2013
    Posts
    235

    Default

    It is possible, as far as any java library is able to do it. As a last resort you may ask someone to write a transformation step to read tweeter. If you start using UDJS - you are one step close from writing custom kettle step by your hand.

    By the way -

    Code:
    2014/10/14 16:48:43 – User Defined Java Class 2.0 –  ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by  buildguy) : java.lang.OutOfMemoryError: unable to create new native  thread
    When running UDJS is different when running from eclipse. Eclipse is not running for periods of time and even your code has leaks you may not meet them even they are exists. It is very seems that in UDJS step code some resources did not get properly released. You can post a simple of your code here. And we will be happy to help you.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.