我正在使用KafkaIO
从Kafka构建Apache Beam管道读取,但是我不确定如何解决序列化问题。
如何使用KafkaIO:
this.pipeline
.apply("ReadFromKafka",
KafkaIO
.<byte[], byte[]>read()
.withConsumerFactoryFn(input -> {
this.updateKafkaConsumerProperties(this.kafkaConsumerConfig, input);
return new KafkaConsumer<>(input);
})
.withBootstrapServers(kafkaConsumerConfig.getBootstrapServer())
.withTopic(this.pipelineSourceKafkaConfiguration.getOnboardingTopic())
.withKeyDeserializer(ByteArrayDeserializer.class)
.withValueDeserializer(ByteArrayDeserializer.class))
.apply("WindowTheData", Window.into(FixedWindows.of(Duration.standardSeconds(5))))
...
但是我的驱动程序无法启动,并引发了以下情况:
java.lang.IllegalArgumentException: unable to serialize org.apache.beam.sdk.io.kafka.KafkaUnboundedSource@65bd19bf
at org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:57)
at org.apache.beam.sdk.util.SerializableUtils.clone(SerializableUtils.java:107)
at org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:86)
at org.apache.beam.sdk.io.Read$Unbounded.<init>(Read.java:137)
at org.apache.beam.sdk.io.Read$Unbounded.<init>(Read.java:132)
at org.apache.beam.sdk.io.Read.from(Read.java:55)
at org.apache.beam.sdk.io.kafka.KafkaIO$Read.expand(KafkaIO.java:665)
at org.apache.beam.sdk.io.kafka.KafkaIO$Read.expand(KafkaIO.java:277)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:188)
at com.company.lib.pipelines.DataPersistencePipeline.execute(DataPersistencePipeline.java:64)
at com.company.app.MainApp.registerPipelineEndpoints(MainApp.java:102)
at com.company.app.MainApp.run(MainApp.java:81)
at com.company.app.MainApp.run(MainApp.java:44)
at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43)
at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87)
at io.dropwizard.cli.Cli.run(Cli.java:78)
at io.dropwizard.Application.run(Application.java:93)
at com.company.app.MainApp.main(MainApp.java:51)
Caused by: java.io.NotSerializableException: com.company.lib.pipelines.DataPersistencePipeline
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:53)
... 20 more
该异常抱怨org.apache.beam.sdk.io.kafka.KafkaUnboundedSource
对象不可序列化。
该类来自Apache Beam SDK,它实际上实现了Serializable
接口。不知道我在哪里做错了。
答案 0 :(得分:0)
似乎KafkaIO.Read#withConsumerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction)
方法要求其参数为Serializable
。
由于用作参数的lambda表达式引用了外部类的成员变量(this.kafkaConsumerConfig
),因此外部类(在这种情况下为DataPersistencePipeline
)也必须为Serializable
。
(例外情况实际上已指出:Caused by: java.io.NotSerializableException: com.company.lib.pipelines.DataPersistencePipeline
)