如何在Flink上获取窗口流中每个单词的计数器

时间:2017-09-11 13:28:16

标签: scala apache-flink flink-streaming

我想得到每个单词带窗口函数的单词数:

如果我使用此代码:

object WindowWordCount {
  def main(args: Array[String]) {

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val text = env.socketTextStream("localhost", 9999)

    val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
      .map { (_, 1) }
      .keyBy(0)
      .timeWindow(Time.seconds(5))
      .sum(1)

    counts.print

    env.execute("Window Stream WordCount")
  }
}

我在5次借调(窗口时间)之后得到输出:

输入:

first input : hello
seconde input : hello
third input : word
fifth input : hello
sixth input : word

输出

first output : hello : 3 | word : 2

但我希望每个单词的输出都有计数。

像那样: 输入:

first input: hello
seconde input:hello
third input:word
fifth input:hello
sixth input:word

out put:

first output: hello : 1
seconde output:hello : 2
third output:word : 1
fifth output:hello : 3
sixth output:word : 2

我该怎么做?

1 个答案:

答案 0 :(得分:0)

Kafka Streaming API的示例程序难道不是您想要的吗? discourse post

object WindowWordCount {
  def main(args: Array[String]) {

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val text = env.socketTextStream("localhost", 9999)

    val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
      .map { (_, 1) }
      .keyBy(0)
      .timeWindow(Time.seconds(5))
      .sum(1)

    counts.print

    env.execute("Window Stream WordCount")
  }
}