在WindowedStream中查找前K个元素-Flink

时间:2019-05-02 11:46:49

标签: scala stream streaming apache-flink flink-streaming

我在Streams领域还很陌生,在我的第一次尝试中遇到了一些问题。

我想做的是在下面的window: WindowdStream中找到Top K元素。 我尝试实现自己的功能,但不确定其实际工作方式。

似乎什么也没打印

你有什么提示吗?

val parsedStream: DataStream[(String, Response)] = stream
      .mapWith(_.decodeOption[Response])
      .filter(_.isDefined)
      .map { record =>
        (
          s"${record.get.group.group_country}, ${record.get.group.group_city}",
          record.get
        )
      }

val topLocations = parsedStream
      .keyBy(_._1)
      .timeWindow(Time.days(7))
      .process(new SortByCountFunction)

SortByCountFunction

class SortByCountFunction
    extends ProcessWindowFunction[(String, Response), MeetUpLocationWindow, String, TimeWindow] {

    override def process(key: String,
                         context: Context,
                         elements: Iterable[(String, Response)],
                         out: Collector[MeetUpLocationWindow]): Unit = {

      val count: Map[String, Iterable[(String, Response)]] = elements.groupBy(_._1)

      val locAndCount: Seq[MeetUpLocation] = count.toList.map(tmp => {
        val location: String = tmp._1
        val meetUpList: Iterable[(String, Response)] = tmp._2
        MeetUpLocation(location, tmp._2.size, meetUpList.map(_._2).toList)
      })

      val output: List[MeetUpLocation] = locAndCount.sortBy(tup => tup.count).take(20).toList

      val windowEnd = context.window.getEnd

      out.collect(MeetUpLocationWindow(windowEnd, output))
    }
  }

case class MeetUpLocationWindow(endTs: Long, locations: List[MeetUpLocation])

case class MeetUpLocation(location: String, count: Int, meetUps: List[Response])

1 个答案:

答案 0 :(得分:0)

当您的Flink DataStream作业无法产生任何输出时,通常可疑的是:

  • 该作业不会在StreamExecutionEnvironment(例如env.execute())上调用execute()
  • 该作业没有接收器(例如TopLocations.print()
  • 该工作本来是要使用事件时间的,但是水印设置不正确,或者空闲的源阻止了水印的前进
  • 该作业正在写入taskmanager日志,但没有人注意到
  • 输出类型的序列化器不产生输出

在没有更多信息的情况下,很难猜测在这种情况下哪些可能是问题。