DataFlow AvroCoder java.lang.IllegalArgumentException

时间:2016-06-19 02:31:32

标签: google-cloud-dataflow

我正在为这里描述的参数化类型编写自定义编码器工厂:Dataflow output parameterized type to avro file

p.getCoderRegistry().registerCoder(MyOutput.class, new CoderFactory() {
  @Override
  public Coder<?> create(List<? extends Coder<?>> componentCoders) {
    AvroCoder tCoder = (AvroCoder) componentCoders.get(0);
    AvroCoder sCoder = (AvroCoder) componentCoders.get(1);
    Schema schema = makeMyOutputSchema(tCoder.getSchema(),
      sCoder.getSchema());
    return AvroCoder.of(MyOutput.class, schema);
  }
  @Override
  public List<Object> getInstanceComponents(Object value) {
    MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
    return Arrays.asList(new Object[] {myOutput.foo, myOutput.bar});
  }
});

结果架构如下所示:

{
  "type":"record",
  "name":"MyOutput",
  "namespace":"myNamespace",
  "fields":[
    {"name":"baz","type":"boolean"},
    { 
      "name":"foo",
      "type":{
        "type":"record",
        "name":"Foo",
        "namespace":"myNamespace",
        "fields":[{"name":"id","type":"string"}]
      }
    },{
      "name":"bar",
      "type":{
        "type":"record",
        "name":"Bar",
        "namespace":"myNamespace",
        "fields":[{"name":"id","type":"string"}]}}]}

Schema正确解析,但当我尝试执行管道时,我得到:

com.google.cloud.dataflow.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalArgumentException: Unable to get field id from class null
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:186)
    at com.google.cloud.dataflow.sdk.testing.TestPipeline.run(TestPipeline.java:106)
    at mypackage.GenericsTest.testGenerics(GenericsTest.java:116)
Caused by: java.lang.IllegalArgumentException: Unable to get field id from class null
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.getField(AvroCoder.java:710)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.checkRecord(AvroCoder.java:548)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.doCheck(AvroCoder.java:477)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.recurse(AvroCoder.java:453)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.checkRecord(AvroCoder.java:567)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.doCheck(AvroCoder.java:477)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.recurse(AvroCoder.java:453)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder$AvroDeterminismChecker.check(AvroCoder.java:430)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder.<init>(AvroCoder.java:189)
    at com.google.cloud.dataflow.sdk.coders.AvroCoder.of(AvroCoder.java:144)
    at mypackage.GenericsTest$1.create(GenericsTest.java:102)
    at com.google.cloud.dataflow.sdk.coders.CoderRegistry.getDefaultCoderFromFactory(CoderRegistry.java:797)
    at com.google.cloud.dataflow.sdk.coders.CoderRegistry.getDefaultCoder(CoderRegistry.java:748)
    at com.google.cloud.dataflow.sdk.coders.CoderRegistry.getDefaultCoder(CoderRegistry.java:719)
    at com.google.cloud.dataflow.sdk.coders.CoderRegistry.getDefaultCoder(CoderRegistry.java:696)
    at com.google.cloud.dataflow.sdk.coders.CoderRegistry.getDefaultCoder(CoderRegistry.java:178)
    at com.google.cloud.dataflow.sdk.values.TypedPValue.inferCoderOrFail(TypedPValue.java:147)
    at com.google.cloud.dataflow.sdk.values.TypedPValue.getCoder(TypedPValue.java:48)

管道只是一个虚拟管道,它应用一个输出MyOutput对象的转换。它在输出非参数化类型时正确运行。

public static class MyTransform extends PTransform<
  PCollection<String>,
  PCollection<MyOutput<Foo, Bar>>> {
    @Override
    public PCollection<MyOutput<Foo, Bar>> apply(
        PCollection<String> input) {
      PCollection<MyOutput<Foo, Bar>> output = input.apply(
        ParDo.of(new DoFn<String, MyOutput<Foo, Bar>>() {
          @Override
          public void processElement(ProcessContext c) {
            c.output(new MyOutput<Foo, Bar>(new Foo(), new Bar()));
          }
        }));
      return output;
    }
}

为什么我收到此错误?

static class MyOutput<T, S> {
  T foo;
  S bar;
  Boolean baz;
  public MyOutput() {}
  public MyOutput(T foo, S bar) {this.foo=foo; this.bar=bar; this.baz=false;}
}
@DefaultCoder(AvroCoder.class)
static class Bar {
  String id;
  public Bar() {this.id="t";}
}
@DefaultCoder(AvroCoder.class)
static class Foo {
  String id;
  public Foo() {this.id="s";}
}

static Schema makeMyOutputSchema(Schema tSchema, Schema sSchema) {
  Schema schema = new Schema.Parser().parse("{\"type\":\"record\","
    + "\"name\":\"MyOutput\","
    + "\"namespace\":\"impersonation\","
    + "\"fields\":["
    + "  {\"name\":\"baz\", \"type\": \"boolean\"},"
    + "  {\"name\":\"foo\", \"type\": " + tSchema.toString() + "},"
    + "  {\"name\":\"bar\", \"type\": " + sSchema.toString() + "}"
    + "]}");
  LOG.info(schema.toString());
  return schema;
}

1 个答案:

答案 0 :(得分:1)

这是AvroCoder中的一个错误。见https://issues.apache.org/jira/browse/BEAM-359