Spark与Twitter4j未收到推文

时间:2018-11-23 09:53:41

标签: java apache-spark spark-streaming twitter4j

我正在尝试使用Apache Spark和Twitter4J从时间轴访问推文。我似乎已正确验证身份,但似乎未收到任何推文。

这是我的代码:

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.twitter.TwitterUtils;

import twitter4j.Status;
import twitter4j.TwitterFactory;
import twitter4j.auth.Authorization;
import twitter4j.conf.ConfigurationBuilder;

public class TwitterAssign {

    //Setup the spark configuration
    private static SparkConf conf = new SparkConf().setAppName("1").setMaster("local");

//  Switched to Spark 2.2 froim 2.3 due to error detailed here : https://stackoverflow.com/questions/49180931/abstractmethoderror-creating-kafka-stream
//      
    public static void main(String args[]) {

        ConfigurationBuilder cb = new ConfigurationBuilder();
        cb.setDebugEnabled(true)
          .setOAuthConsumerKey("****")
          .setOAuthConsumerSecret("****")
          .setOAuthAccessToken("****")
          .setOAuthAccessTokenSecret("****");
        TwitterFactory tf = new TwitterFactory(cb.build());

        JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(10000));

        jssc.checkpoint(".");

        String[] filters={"some", "keywords", "as", "filter"};
//      JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,tf.getInstance().getAuthorization() , filters);
        Authorization myauth = tf.getInstance().getAuthorization();

        JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc, myauth);


        twitterStream.checkpoint(new Duration(10000));

        JavaDStream<String> statuses = twitterStream.map(
                  new Function<Status, String>() {
                    public String call(Status status) { 
                        return "mytweets"+status.getText(); 
                    }
                  }
                );

        statuses.foreachRDD(x -> System.out.println(x.collect()));

        jssc.start();
        try {
            jssc.awaitTermination();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
}

以下是输出:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/11/22 23:30:24 INFO SparkContext: Running Spark version 2.2.2 18/11/22 23:30:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/11/22 23:30:24 INFO SparkContext: Submitted application: 1 18/11/22 23:30:24 INFO SecurityManager: Changing view acls to: user 18/11/22 23:30:24 INFO SecurityManager: Changing modify acls to: user 18/11/22 23:30:24 INFO SecurityManager: Changing view acls groups to:  18/11/22 23:30:24 INFO SecurityManager: Changing modify acls groups to:  18/11/22 23:30:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(user); groups with view permissions: Set(); users  with modify permissions: Set(user); groups with modify permissions: Set() 18/11/22 23:30:25 INFO Utils: Successfully started service 'sparkDriver' on port 61571. 18/11/22 23:30:25 INFO SparkEnv: Registering MapOutputTracker 18/11/22 23:30:25 INFO SparkEnv: Registering BlockManagerMaster
18/11/22 23:30:25 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/22 23:30:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/22 23:30:25 INFO DiskBlockManager: Created local directory at /private/var/folders/tp/vvqvqkhx0rg3z08vrbjzky6h0000gn/T/blockmgr-dcdebcc2-b527-448d-b88d-09c39078cf2e
18/11/22 23:30:25 INFO MemoryStore: MemoryStore started with capacity 912.3 MB
18/11/22 23:30:25 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/22 23:30:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/22 23:30:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.24:4040
18/11/22 23:30:25 INFO Executor: Starting executor ID driver on host localhost
18/11/22 23:30:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61572.
18/11/22 23:30:25 INFO NettyBlockTransferService: Server created on 192.168.0.24:61572
18/11/22 23:30:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/22 23:30:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:25 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.24:61572 with 912.3 MB RAM, BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:26 WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.
18/11/22 23:30:26 INFO ReceiverTracker: Starting 1 receivers
18/11/22 23:30:26 INFO ReceiverTracker: ReceiverTracker started
18/11/22 23:30:26 INFO TwitterInputDStream: Slide time = 10000 ms
18/11/22 23:30:26 INFO TwitterInputDStream: Storage level = Memory Serialized 1x Replicated
18/11/22 23:30:26 INFO TwitterInputDStream: Checkpoint interval = 10000 ms
18/11/22 23:30:26 INFO TwitterInputDStream: Remember interval = 20000 ms
18/11/22 23:30:26 INFO TwitterInputDStream: Initialized and validated org.apache.spark.streaming.twitter.TwitterInputDStream@11b377c5
18/11/22 23:30:26 INFO MappedDStream: Slide time = 10000 ms
18/11/22 23:30:26 INFO MappedDStream: Storage level = Serialized 1x Replicated
18/11/22 23:30:26 INFO MappedDStream: Checkpoint interval = null
18/11/22 23:30:26 INFO MappedDStream: Remember interval = 10000 ms
18/11/22 23:30:26 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@144ab54
18/11/22 23:30:26 INFO ForEachDStream: Slide time = 10000 ms
18/11/22 23:30:26 INFO ForEachDStream: Storage level = Serialized 1x Replicated
18/11/22 23:30:26 INFO ForEachDStream: Checkpoint interval = null
18/11/22 23:30:26 INFO ForEachDStream: Remember interval = 10000 ms
18/11/22 23:30:26 INFO ForEachDStream: Initialized and validated 

我已经设置call以便访问推文吗? :

   JavaDStream<String> statuses = twitterStream.map(
              new Function<Status, String>() {
                public String call(Status status) { 
                    return "mytweets"+status.getText(); 
                }
              }
            );

0 个答案:

没有答案