Dstream twitter示例-带有文本的平面图twitter_id
scala和spark流的新增功能。尝试扩展示例twitter流代码以将tweet拆分为单词,但保持这些单词与twitter id连接。
setupLogging()
val tweets = TwitterUtils.createStream(ssc, None)
val statuses = tweets.map(status => status.getText())
val tweetwords = statuses.flatmap((tweetText => tweetText.split(" ")
tweetwords.print
//get running list of words from tweets.
This
is
my
tweet
"#mytweet"
//instead want the same list with an twitter_id attached
val statuses = tweetmap{status => (status.getUser().getID(), status.getText())}
val tweetwords = statuses.flatmap( ????? This is where I am lost )
//this is what I want
tweetwords.print
1523523, This
1523523, is
1523523, my
1523523, tweet
1523523, #mytweet
我愿意接受其他方法,包括数据帧/数据集。 谢谢!
答案 0 :(得分:0)
如果有人找这个...
val tweetwords = statuses.flatmap(case (t1, t2) => t2.split(" ").map((t1, _))}