我正在尝试将PipelineOptions接口传递给数据流DoFn,以便DoFn可以配置它需要重新实例化的一些不可序列化的东西,但是当我告诉它持有一个实例时,似乎Dataflow无法序列化DoFn我的PipelineOptions子类。我是否需要对Options接口执行某些操作才能使其正确序列化?
我知道这是编写自定义序列化+反序列化代码的选项(如https://gist.github.com/jlewi/f1cd323dc88bd58601ef,How to fix Dataflow unable to serialize my DoFn?),但似乎PipelineOptions类明确表示它应该是可序列化的,我会我更喜欢不在每个使用此选项对象的DoFn中编写序列化和反序列化代码。
选项类代码段:
public interface Options
extends BigtableOptions, BigtableScanOptions, OfflineModuleOptions, Serializable {...}
DoFn定义
public class RunEventGeneratorsDoFn extends DoFn<...,...> {
private OfflinePipelineRunner.Options options;
....
}
选项未标记为transient
Exception in thread "main" java.lang.IllegalArgumentException: unable to serialize [my DoFn]
at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:54)
at com.google.cloud.dataflow.sdk.util.SerializableUtils.clone(SerializableUtils.java:91)
at com.google.cloud.dataflow.sdk.transforms.ParDo$Bound.<init>(ParDo.java:720)
at com.google.cloud.dataflow.sdk.transforms.ParDo$Unbound.of(ParDo.java:678)
at com.google.cloud.dataflow.sdk.transforms.ParDo$Unbound.access$000(ParDo.java:596)
at com.google.cloud.dataflow.sdk.transforms.ParDo.of(ParDo.java:563)
at com.google.cloud.dataflow.sdk.transforms.ParDo.of(ParDo.java:558)
at [dofn instantiation line]
Caused by: java.io.NotSerializableException: com.google.cloud.dataflow.sdk.options.ProxyInvocationHandler
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:50)
... 7 more
答案 0 :(得分:2)
实际的管道选项对象不应包含在特定DoFn
或PTransform
中的字段中。而是传递您要访问的特定选项的值。
有关更多背景信息,请参阅此问题“How to get PipelineOptions in composite PTransform in Beam 2.0?”。