在FlinkRunner上使用Beam的KafkaIO时如何修复NotSerialiazableException

时间:2019-08-14 16:54:21

标签: java apache-kafka apache-flink apache-beam

我正在尝试在Flink集群上运行Apache Beam应用程序,但由于翻译Kafka UnboundedSource错误而失败,并显示df_last_week = df.loc[df['date'] >= '2019-08-17'] 。该应用程序是一个字数示例,可从Kafka主题读取并发布到Kafka主题,并且使用Beam的直接运行器可以正常工作。

我通过遵循Beam的QuickStart Java创建了pom.xml,然后添加了KafkaIO sdk。我正在运行一个单节点本地Flink 1.8.1集群和Kafka 2.3.0。

pom.xml代码段

[partitions type:ARRAY pos:0] is not serializable

KafkaWordCount.java代码段

    <properties>
      <beam.version>2.14.0</beam.version>
      <flink.artifact.name>beam-runners-flink-1.8</flink.artifact.name>
      <flink.version>1.8.1</flink.version>
    </properties>
...
    <profile>
      <id>flink-runner</id>
      <!-- Makes the FlinkRunner available when running a pipeline. -->
      <dependencies>
        <dependency>
          <groupId>org.apache.beam</groupId>
          <!-- Please see the Flink Runner page for an up-to-date list
               of supported Flink versions and their artifact names:
               https://beam.apache.org/documentation/runners/flink/ -->
          <artifactId>${flink.artifact.name}</artifactId>
          <version>${beam.version}</version>
          <scope>runtime</scope>
        </dependency>
        <!-- Tried with and without this flink-avro dependency -->
        <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-avro</artifactId>
          <version>${flink.version}</version>
        </dependency>
      </dependencies>
    </profile>
...
    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-sdks-java-io-kafka</artifactId>
      <version>${beam.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>2.3.0</version>
    </dependency>

完整的错误消息,是通过 // Create the Pipeline object with the options we defined above. Pipeline p = Pipeline.create(options); PCollection<KV<String, Long>> counts = p.apply(KafkaIO.<String, String>read() .withBootstrapServers(options.getBootstrapServer()) .withTopics(Collections.singletonList(options.getInputTopic())) .withKeyDeserializer(StringDeserializer.class) .withValueDeserializer(StringDeserializer.class) .updateConsumerProperties(ImmutableMap.of("auto.offset.reset", (Object)"latest")) .withoutMetadata() // PCollection<KV<Long, String>> instead of KafkaRecord type ) 将Beam jar提交给Flink的结果

/opt/flink/bin/flink run -c org.apache.beam.examples.KafkaWordCount target/word-count-beam-bundled-0.1.jar --runner=FlinkRunner --bootstrapServer=localhost:9092

更新

结果证明,Beam中存在一个与在Flink上运行有关的问题,该问题似乎与此有关:https://issues.apache.org/jira/browse/BEAM-7478。关于它的评论之一特别提到,由于Avro的Schema.Field无法序列化,因此无法将flink / run与KafkaIO一起使用:https://issues.apache.org/jira/browse/BEAM-7478?focusedCommentId=16902419&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16902419

更新2

如评论中所述,一种解决方法是将Flink降级为1.8.0。

0 个答案:

没有答案