我想得到每个单词带窗口函数的单词数:
如果我使用此代码:
object WindowWordCount {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val text = env.socketTextStream("localhost", 9999)
val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
.map { (_, 1) }
.keyBy(0)
.timeWindow(Time.seconds(5))
.sum(1)
counts.print
env.execute("Window Stream WordCount")
}
}
我在5次借调(窗口时间)之后得到输出:
输入:
first input : hello
seconde input : hello
third input : word
fifth input : hello
sixth input : word
输出
first output : hello : 3 | word : 2
但我希望每个单词的输出都有计数。
像那样: 输入:first input: hello
seconde input:hello
third input:word
fifth input:hello
sixth input:word
out put:
first output: hello : 1
seconde output:hello : 2
third output:word : 1
fifth output:hello : 3
sixth output:word : 2
我该怎么做?
答案 0 :(得分:0)
Kafka Streaming API的示例程序难道不是您想要的吗? discourse post
object WindowWordCount {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val text = env.socketTextStream("localhost", 9999)
val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
.map { (_, 1) }
.keyBy(0)
.timeWindow(Time.seconds(5))
.sum(1)
counts.print
env.execute("Window Stream WordCount")
}
}