两列平面图

时间:2019-04-26 21:06:35

标签: apache-spark-sql spark-streaming

Dstream twitter示例-带有文本的平面图twitter_id

scala和spark流的新增功能。尝试扩展示例twitter流代码以将tweet拆分为单词,但保持这些单词与twitter id连接。


setupLogging()

val tweets = TwitterUtils.createStream(ssc, None)
val statuses = tweets.map(status => status.getText())
val tweetwords = statuses.flatmap((tweetText => tweetText.split(" ")

tweetwords.print
//get running list of words from tweets. 
This
is 
my 
tweet
"#mytweet"

//instead want the same list with an twitter_id attached
val statuses = tweetmap{status => (status.getUser().getID(), status.getText())}
val tweetwords = statuses.flatmap( ????? This is where I am lost )

//this is what I want
tweetwords.print

1523523, This
1523523, is
1523523, my
1523523, tweet
1523523, #mytweet

我愿意接受其他方法,包括数据帧/数据集。 谢谢!

1 个答案:

答案 0 :(得分:0)

如果有人找这个...

val tweetwords = statuses.flatmap(case (t1, t2) => t2.split(" ").map((t1, _))}