为什么在耗尽流水管道时会得到很多None.get异常?

时间:2019-01-17 23:50:52

标签: spotify-scio

我遇到了一个问题,我在Dataflow上运行了一条流式Scio管道,该管道正在对消息进行重复数据删除并按键进行一些计数。当我尝试排干管道时,应该在重复数据删除步骤中抛出大量None.get异常(我将这种假设基于我在栈驱动程序日志中观察到的标签)。

我们当前正在scio版本0.7.0-beta1和Beam版本2.8.0上运行。我已经尽力保护代码中的所有内容免受任何潜在的None破坏,但这似乎是在重复数据删除步骤的更下方进行的。

我得到的错误如下:

"java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:347)
    at scala.None$.get(Option.scala:345)
    at com.spotify.scio.util.Functions$$anon$2.mergeAccumulators(Functions.scala:227)
    at com.spotify.scio.util.Functions$$anon$2.mergeAccumulators(Functions.scala:220)
    at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.getAccum(WindmillStateInternals.java:958)
    at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.read(WindmillStateInternals.java:920)
    at org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
    at org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
    at org.apache.beam.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:768)
    at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:95)
    at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
    at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
    at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
    at org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
    at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:135)
    at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:45)
    at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:50)
    at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:202)
    at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:160)
    at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
    at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1226)
    at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:141)
    at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:965)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

如您所见,这从来没有真正输入我的代码,我不确定应该如何去发现这个问题。也许与“ LateDataDroppingDoFnRunner”有关?我们允许的延迟时间相对较大(3天,窗户长达一个小时)。

val input = PubsubIO.readStrings()
  .fromSubscription(subscription)
  .withTimestampAttribute("ts")
  .withName("Window messages")
    .withFixedWindows(
      duration = windowSize,
      options = WindowOptions(
        trigger = AfterWatermark.pastEndOfWindow()
          .withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
            .plusDelayOf(earlyFiring))
          .withLateFirings(AfterProcessingTime.pastFirstElementInPane()
            .plusDelayOf(lateFiring)),
        accumulationMode = ACCUMULATING_FIRED_PANES,
        allowedLateness = allowedLateness
      )
    )
  .withName(s"Deduplicate messages")
  .distinctBy[String](f = getId)

...
// I am being overly cautious here because I have been having
// so much trouble debugging this
def getId(message: Map[String, Any]): String = {
  message match {
    case null => {
     logger.warn("message is null when getting id")
      ""
    }
    case message => {
      message.get("id") match {
        case None => {
          logger.warn("id is null in message")
          ""
        }
        case id => id.get.toString
      }
    }
  }
}

我很困惑如何在这里获得None.get以及为什么只有在我精干时才会发生这种情况。

请问我应该如何调试该错误或应该去哪里找一些建议?

0 个答案:

没有答案