我在运行Dataflow管道时遇到了一个奇怪的问题。我已经编写了自己的编码器,但是使用AvroCoder,SerializableCoder和其他示例切换出来也产生了同样的问题。
在尝试使用流模式下的Dataflow Service启动管道之后,我得到了一个例外:
Exception in thread "main" java.lang.RuntimeException: Unable to deserialize Coder: ModelCoder. Check that a suitable constructor is defined. See Coder for details.
at com.google.cloud.dataflow.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:113)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.ensureCoderSerializable(DirectPipelineRunner.java:901)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.ensurePCollectionEncodable(DirectPipelineRunner.java:861)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.setPCollectionValuesWithMetadata(DirectPipelineRunner.java:789)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.setPCollection(DirectPipelineRunner.java:776)
at com.google.cloud.dataflow.sdk.io.TextIO.evaluateReadHelper(TextIO.java:786)
at com.google.cloud.dataflow.sdk.io.TextIO.access$000(TextIO.java:118)
at com.google.cloud.dataflow.sdk.io.TextIO$Read$Bound$1.evaluate(TextIO.java:327)
at com.google.cloud.dataflow.sdk.io.TextIO$Read$Bound$1.evaluate(TextIO.java:323)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:706)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:219)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:102)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:252)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:662)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:374)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:87)
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:174)
at io.momentum.demo.models.pipeline.PlatformPipeline.main(PlatformPipeline.java:96)
Caused by: java.lang.IllegalStateException: Sub-class com.google.cloud.dataflow.sdk.util.CoderUtils$Jackson2Module$Resolver MUST implement `typeFromId(DatabindContext,String)
at com.fasterxml.jackson.databind.jsontype.impl.TypeIdResolverBase.typeFromId(TypeIdResolverBase.java:77)
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:156)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:106)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:91)
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:142)
at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:42)
at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3760)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2042)
at com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:2529)
at com.google.cloud.dataflow.sdk.util.Serializer.deserialize(Serializer.java:98)
at com.google.cloud.dataflow.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:110)
... 18 more
我的实现Coder
只是包装AvroCoder
并挂钩我们自己的一些代码:
public final class ModelCoder<M extends AppModel> extends AtomicCoder<M> {
public static <T extends AppModel> ModelCoder<T> of(Class<T> clazz) {
return new ModelCoder<>(clazz);
}
@JsonCreator
@SuppressWarnings("unchecked")
public static ModelCoder<?> of(@JsonProperty("kind") String classType) throws ClassNotFoundException {
Class<?> clazz = Class.forName(classType);
return of((Class<? extends AppModel>) clazz);
}
private String kind;
public ModelCoder(Class<M> type) {
this.kind = type.getSimpleName();
}
@Override
public void encode(M value, OutputStream outStream, Context context) throws IOException, CoderException {
CoderInternals.encode(value, outStream, context, new TypeReference<TypedSerializedModel<M>>() { });
}
@Override
public M decode(InputStream inStream, Context context) throws IOException, CoderException {
return CoderInternals.decode(inStream, context, new TypeReference<TypedSerializedModel<M>>() { });
}
@Override
public CloudObject asCloudObject() {
CloudObject co = super.asCloudObject();
co.set("kind", kind);
return co;
}
}
编码器在调用encode(..)
或decode(..)
和AppModel
时按预期工作,但无论如何都会发生此异常。
答案 0 :(得分:4)
您需要使用@JsonCreator标记的静态方法,以便服务可以在worker上实例化您的编码器。你也不应该覆盖asCloudObject();这决定了你的编码器将如何被序列化并发送给工人,你的代码将只发送一个序列化的AvroCoder。
例如,请查看NullableCoder.java(https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/NullableCoder.java)以获取包含另一个编码器的编码器示例。