使用Twitter API和Spark搜索特定的关键字

时间:2018-11-29 14:50:49

标签: scala apache-spark twitter twitter-streaming-api

我正在尝试这段代码,然后用#Apple替换了#。

val ssc = new StreamingContext("local[*]", "PopularHashtags", Seconds(1))
val tweets = TwitterUtils.createStream(ssc, None)
val statuses = tweets.map(status => status.getText())
val tweetwords = statuses.flatMap(tweetText => tweetText.split(" "))
val hashtags = tweetwords.filter(word => word.startsWith("#"))
val hashtagKeyValues = hashtags.map(hashtag => (hashtag, 1))
val hashtagCounts = hashtagKeyValues.reduceByKeyAndWindow( (x,y) => x + y, (x,y) => x - y, Seconds(1000), Seconds(1))
val sortedResults = hashtagCounts.transform(rdd => rdd.sortBy(x => x._2, false))
sortedResults.print

但是我没有得到任何结果。

此流式传输对多少条推文以及从哪个区域获取这些推文有一定的限制吗? 我也尝试寻找#OPPO,因为在我的Twitter帐户中这是一种趋势,所以我尝试寻找它,但仍然没有得到任何结果。

1 个答案:

答案 0 :(得分:0)

val ssc = new StreamingContext("local[*]", "PopularHashtags", Seconds(1))
//The keyword you want to look for can be specified in a sequence as follows
var seq:Seq[String] = Seq("#Rajasthan","#Apple")
val tweets = TwitterUtils.createStream(ssc, None, seq)
val statuses = tweets.map(status => status.getText())
val tweetwords = statuses.flatMap(tweetText => tweetText.split(" "))
val hashtags = tweetwords.filter(word=>word.contains("#"))
val hashtagKeyValues = hashtags.map(hashtag => (hashtag, 1))
val hashtagCounts = hashtagKeyValues.reduceByKeyAndWindow( (x,y) => x + y, (x,y) => x - y, Seconds(1000), Seconds(1))
val sortedResults = hashtagCounts.transform(rdd => rdd.sortBy(x => x._2, false))
sortedResults.print