Apache Beam:PubsubReader因NPE而失败

时间:2017-03-23 18:31:45

标签: apache-beam

我有一个从PubSub读取的beam管道,并在应用了一些转换后写入BigQuery。管道与NPE一致失败。我正在使用beam SDK版本0.6.0。关于我可能做错什么的任何想法?我正在尝试使用DirectRunner运行管道。

java.lang.NullPointerException
at org.apache.beam.sdk.io.PubsubUnboundedSource$PubsubReader.ackBatch(PubsubUnboundedSource.java:640)
at org.apache.beam.sdk.io.PubsubUnboundedSource$PubsubCheckpoint.finalizeCheckpoint(PubsubUnboundedSource.java:313)
at org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.getReader(UnboundedReadEvaluatorFactory.java:174)
at org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:127)
at org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:1)

由于DirectRunner中的Bug(BEAM-1656)和PubsubCheckpoint中的前提条件,因此存在此问题。 DirectRunner中的错误已在pull请求2237中修复,该请求已合并到Github主分支中,但在0.6.0版本之后。

使用DirectRunner时,从github HEAD更新到0.7.0每晚构建或构建将解决此问题。

要更新到当前的每晚构建,您必须将以下存储库添加到项目的pom.xml。包含此修复程序的beam-runners-direct-java模块的最早版本为0.7.0-20170316.070901-9,但并非所有模块都使用此特定版本构建,因此您可能必须指定单独兼容的版本或使用0.7.0-SNAPSHOT

    <repositories>
      <repository>
        <id>apache.snapshots</id>
        <name>Apache Development Snapshot Repository</name>

 <url>https://repository.apache.org/content/repositories/snapshots/</url>
        <releases>
          <enabled>false</enabled>
        </releases>
        <snapshots>
          <enabled>true</enabled>
        </snapshots>
      </repository>

    </repositories>