Spark流媒体Twitter - 无主机路由。可以在互联网上找到相关的讨论

时间:2015-09-03 11:09:05

标签: twitter4j spark-streaming

我正在使用Spark Streaming从Twitter中提取推文并处理它们。在Eclipse中使用master设置为local [*]执行我的应用程序它可以工作,但是当我将它提交给服务器中的Spark时,我收到以下错误:

  

没有主持人的路线。可在互联网上找到相关讨论:http://www.google.co.jp/search?q=944a924ahttp://www.google.co.jp/search?q=24fd66eb

我的代码

SparkConf sparkConf = new SparkConf().setAppName("AppStreaming");
// local para ejecutar spark en local
if(local){
    sparkConf.setMaster("local[*]"); 
}   
// set proxy
System.setProperty("http.proxyHost", PROXY_HOST);
System.setProperty("http.proxyPort", PROXY_PORT);
System.setProperty("https.proxyHost", PROXY_HOST);
System.setProperty("https.proxyPort", PROXY_PORT);

JavaStreamingContext streamingContext = new JavaStreamingContext(sparkConf, Seconds.apply(5)); // extraer nuevos tweets cada 5 segundos

// conectar a Twitter..
ConfigurationBuilder cb=new ConfigurationBuilder();
cb.setDebugEnabled(true)
    .setOAuthConsumerKey(CONSUMER_KEY)
    .setOAuthConsumerSecret(CONSUMER_SECRET)
    .setOAuthAccessToken(ACCESS_TOKEN)
    .setOAuthAccessTokenSecret(ACCESS_TOKEN_SECRET)
    .setHttpConnectionTimeout(100000)
    .setUseSSL(true);
// proxy    
cb.setHttpProxyHost(PROXY_HOST)
        .setHttpProxyPort(PROXY_PORT);

TwitterFactory factory = new TwitterFactory(cb.build());
Twitter twitter = factory.getInstance();
Authorization authorization = twitter.getAuthorization();

final String[] filters = new String[]{"credit card"};  
JavaReceiverInputDStream<Status> stream = TwitterUtils.createStream(streamingContext, authorization, filters );

JavaDStream<Status> results = stream.filter(new Function<Status, Boolean>(){

    private static final long serialVersionUID = 1L;
    @Override
    public Boolean call(Status tweet) {

        myFunction();

    }
});

results.print();
// start streaming...
streamingContext.start();
streamingContext.awaitTermination();

控制台输出:

Time: 1441269605000 ms
-------------------------------------------

15/09/03 10:40:05 INFO JobScheduler: Finished job streaming job 1441269605000 ms.0 from job set of time 1441269605000 ms
15/09/03 10:40:05 INFO JobScheduler: Total delay: 0,009 s for time 1441269605000 ms (execution: 0,001 s)
15/09/03 10:40:05 INFO JobScheduler: Added jobs for time 1441269605000 ms
15/09/03 10:40:05 INFO MapPartitionsRDD: Removing RDD 173 from persistence list
15/09/03 10:40:05 INFO BlockManager: Removing RDD 173
15/09/03 10:40:05 INFO BlockRDD: Removing RDD 172 from persistence list
15/09/03 10:40:05 INFO BlockManager: Removing RDD 172
15/09/03 10:40:05 INFO TwitterInputDStream: Removing blocks of RDD BlockRDD[172] at createStream at MoriartyStreaming.java:216 of time 1441269605000 ms
15/09/03 10:40:05 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer(1441269595000 ms)
15/09/03 10:40:05 INFO InputInfoTracker: remove old batch metadata: 1441269595000 ms
15/09/03 10:40:05 INFO ReceiverTracker: Registered receiver for stream 0 from 7.121.100.47:48986
15/09/03 10:40:05 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error receiving tweets - No existe ninguna ruta hasta el «host»
Relevant discussions can be found on the Internet at:
        http://www.google.co.jp/search?q=944a924a or
        http://www.google.co.jp/search?q=24fd66eb
TwitterException{exceptionCode=[944a924a-24fd66eb 944a924a-24fd66c1], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3}
        at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:192)
        at twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61)
        at twitter4j.internal.http.HttpClientWrapper.post(HttpClientWrapper.java:98)
        at twitter4j.TwitterStreamImpl.getFilterStream(TwitterStreamImpl.java:304)
        at twitter4j.TwitterStreamImpl$7.getStream(TwitterStreamImpl.java:292)
        at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:462)
Caused by: java.net.NoRouteToHostException: No existe ninguna ruta hasta el «host»
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:625)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:933)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1092)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
        at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:150)
        ... 5 more

的pom.xml:

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-twitter_2.10</artifactId> 
        <version>1.1.0</version>    
</dependency>

我的应用程序在本地和服务器的代理后面运行。

有任何关于我收到此错误的建议吗?

1 个答案:

答案 0 :(得分:0)

我一直在努力解决同样的错误,直到我将其添加到我的.bashrc文件中。

export _JAVA_OPTIONS='-Dhttp.proxyHost=www.xxxx.net -Dhttp.proxyPort=8080 -Dtwitter4j.http.proxyHost=www.xxxx.net -Dtwitter4j.http.proxyPort=8080 -Dtwitter4j.https.proxyHost=www.xxxx.net -Dtwitter4j.https.proxyPort=8080'