我正在尝试使用Apache Spark和Twitter4J从时间轴访问推文。我似乎已正确验证身份,但似乎未收到任何推文。
这是我的代码:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.twitter.TwitterUtils;
import twitter4j.Status;
import twitter4j.TwitterFactory;
import twitter4j.auth.Authorization;
import twitter4j.conf.ConfigurationBuilder;
public class TwitterAssign {
//Setup the spark configuration
private static SparkConf conf = new SparkConf().setAppName("1").setMaster("local");
// Switched to Spark 2.2 froim 2.3 due to error detailed here : https://stackoverflow.com/questions/49180931/abstractmethoderror-creating-kafka-stream
//
public static void main(String args[]) {
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey("****")
.setOAuthConsumerSecret("****")
.setOAuthAccessToken("****")
.setOAuthAccessTokenSecret("****");
TwitterFactory tf = new TwitterFactory(cb.build());
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(10000));
jssc.checkpoint(".");
String[] filters={"some", "keywords", "as", "filter"};
// JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,tf.getInstance().getAuthorization() , filters);
Authorization myauth = tf.getInstance().getAuthorization();
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc, myauth);
twitterStream.checkpoint(new Duration(10000));
JavaDStream<String> statuses = twitterStream.map(
new Function<Status, String>() {
public String call(Status status) {
return "mytweets"+status.getText();
}
}
);
statuses.foreachRDD(x -> System.out.println(x.collect()));
jssc.start();
try {
jssc.awaitTermination();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
以下是输出:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/11/22 23:30:24 INFO SparkContext: Running Spark version 2.2.2 18/11/22 23:30:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/11/22 23:30:24 INFO SparkContext: Submitted application: 1 18/11/22 23:30:24 INFO SecurityManager: Changing view acls to: user 18/11/22 23:30:24 INFO SecurityManager: Changing modify acls to: user 18/11/22 23:30:24 INFO SecurityManager: Changing view acls groups to: 18/11/22 23:30:24 INFO SecurityManager: Changing modify acls groups to: 18/11/22 23:30:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set() 18/11/22 23:30:25 INFO Utils: Successfully started service 'sparkDriver' on port 61571. 18/11/22 23:30:25 INFO SparkEnv: Registering MapOutputTracker 18/11/22 23:30:25 INFO SparkEnv: Registering BlockManagerMaster
18/11/22 23:30:25 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/22 23:30:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/22 23:30:25 INFO DiskBlockManager: Created local directory at /private/var/folders/tp/vvqvqkhx0rg3z08vrbjzky6h0000gn/T/blockmgr-dcdebcc2-b527-448d-b88d-09c39078cf2e
18/11/22 23:30:25 INFO MemoryStore: MemoryStore started with capacity 912.3 MB
18/11/22 23:30:25 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/22 23:30:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/22 23:30:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.24:4040
18/11/22 23:30:25 INFO Executor: Starting executor ID driver on host localhost
18/11/22 23:30:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61572.
18/11/22 23:30:25 INFO NettyBlockTransferService: Server created on 192.168.0.24:61572
18/11/22 23:30:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/22 23:30:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:25 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.24:61572 with 912.3 MB RAM, BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.24, 61572, None)
18/11/22 23:30:26 WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.
18/11/22 23:30:26 INFO ReceiverTracker: Starting 1 receivers
18/11/22 23:30:26 INFO ReceiverTracker: ReceiverTracker started
18/11/22 23:30:26 INFO TwitterInputDStream: Slide time = 10000 ms
18/11/22 23:30:26 INFO TwitterInputDStream: Storage level = Memory Serialized 1x Replicated
18/11/22 23:30:26 INFO TwitterInputDStream: Checkpoint interval = 10000 ms
18/11/22 23:30:26 INFO TwitterInputDStream: Remember interval = 20000 ms
18/11/22 23:30:26 INFO TwitterInputDStream: Initialized and validated org.apache.spark.streaming.twitter.TwitterInputDStream@11b377c5
18/11/22 23:30:26 INFO MappedDStream: Slide time = 10000 ms
18/11/22 23:30:26 INFO MappedDStream: Storage level = Serialized 1x Replicated
18/11/22 23:30:26 INFO MappedDStream: Checkpoint interval = null
18/11/22 23:30:26 INFO MappedDStream: Remember interval = 10000 ms
18/11/22 23:30:26 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@144ab54
18/11/22 23:30:26 INFO ForEachDStream: Slide time = 10000 ms
18/11/22 23:30:26 INFO ForEachDStream: Storage level = Serialized 1x Replicated
18/11/22 23:30:26 INFO ForEachDStream: Checkpoint interval = null
18/11/22 23:30:26 INFO ForEachDStream: Remember interval = 10000 ms
18/11/22 23:30:26 INFO ForEachDStream: Initialized and validated
我已经设置call
以便访问推文吗? :
JavaDStream<String> statuses = twitterStream.map(
new Function<Status, String>() {
public String call(Status status) {
return "mytweets"+status.getText();
}
}
);