编码器问题Apache Beam和CombineFn

时间:2017-05-16 14:38:03

标签: java google-cloud-platform google-cloud-dataflow apache-beam

我们正在使用Apache Beam和DirectRunner作为跑步者构建管道。我们正在尝试一个简单的管道,我们:

  1. 从Google Cloud Pub / Sub(当前使用模拟器在本地运行)中提取数据
  2. 反序列化为Java对象
  3. 使用1分钟固定窗口的窗口事件
  4. 使用自定义CombineFn合并这些窗口,将其从事件转换为事件列表。
  5. 管道代码:

    pipeline
    .apply(PubsubIO.<String>read().topic(options.getTopic()).withCoder(StringUtf8Coder.of()))
    
    .apply("ParseEvent", ParDo.of(new ParseEventFn()))
    
    .apply("WindowOneMinute",Window.<Event>into(FixedWindows.of(Duration.standardMinutes(1))))              
    
    .apply("CombineEvents", Combine.globally(new CombineEventsFn()));
    

    ParseEvent功能:

        static class ParseEventFn extends DoFn<String, Event> {
            @ProcessElement
            public void processElement(ProcessContext c) {
                String json = c.element();
                c.output(gson.fromJson(json, Event.class));
            }
        }
    

    CombineEvents功能:

    public static class CombineEventsFn extends CombineFn<Event, CombineEventsFn.Accum, EventListWrapper> {
            public static class Accum {
                EventListWrapper eventListWrapper = new EventListWrapper();
            }
    
            @Override
            public Accum createAccumulator() {
                return new Accum();
            }
    
            @Override
            public Accum addInput(Accum accumulator, Event event) {
                accumulator.eventListWrapper.events.add(event);
                return accumulator;
            }
    
            @Override
            public Accum mergeAccumulators(Iterable<Accum> accumulators) {
                Accum merged = createAccumulator();
                for (Accum accum : accumulators) {
                    merged.eventListWrapper.events.addAll(accum.eventListWrapper.events);
                }
                return merged;
            }
    
            @Override
            public EventListWrapper extractOutput(Accum accumulator) {
                return accumulator.eventListWrapper;
            }
    
        }
    

    尝试使用Maven和DirectRunner在本地运行时,我们收到以下错误:

    java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.IllegalStateException: Unable to return a default Coder for CombineEvents/Combine.perKey(CombineEvents)/Combine.GroupedValues/ParDo(Anonymous).out [PCollection]. Correct one of the following root causes:
      No Coder has been manually specified;  you may do so using .setCoder().
      Inferring a Coder from the CoderRegistry failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
      Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
      Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
      Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
      Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder$2@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder$1@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
      Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
      Using the default output Coder from the producing PTransform failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
      Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
      Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
      Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
      Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder$2@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder$1@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
      Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
        at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
        at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:51)
        at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:130)
        at org.apache.beam.sdk.values.TypedPValue.finishSpecifying(TypedPValue.java:90)
        at org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:143)
        at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:418)
        at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
        at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:154)
        at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1459)
        at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1336)
        at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
        at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:350)
        at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:167)
    at ***************************.main(***************.java:231)
    ... 6 more
    

    为巨大的代码转储道歉 - 想要提供所有上下文。

    我很好奇它为什么抱怨java.lang.Objectorg.apache.beam.sdk.values.KV<K, OutputT>都没有默认编码器 - 据我所知,我们的管道正在改变{{{}之间的类型1}},StringEvent - 后两个类在类本身上设置了默认编码器(在两种情况下都为EventListWrapper)。

    错误发生在我们应用CombineFn的行上 - 可以确认没有此转换,管道可以正常工作。

    我怀疑我们在某种程度上错误地设置了合并变换,但是在Beam文档中没有发现任何东西指向我们正确的方向。

    任何见解都将受到赞赏 - 提前感谢!

2 个答案:

答案 0 :(得分:7)

您看到java.lang.Object的可能原因是因为Beam正在尝试推断未解析的类型变量的编码器,该变量将被解析为Object。这可能是编码器推断在Combine内完成的错误。

另外,我希望Accum类也会导致编码器推断失败。您可以覆盖getAccumulatorCoder中的CombineFn,直接提供一个。{/ p>

答案 1 :(得分:1)

您是否检查了将“可序列化”添加到累加器是否直接起作用?

因此在Accum类中添加“可序列化的实现” ...

public static class Accum implements Serializable {
            EventListWrapper eventListWrapper = new EventListWrapper();
        }