Apache Beam python SDK失败,出现IllegalArgumentException

时间:2018-10-09 18:20:53

标签: google-cloud-platform google-cloud-dataflow apache-beam

java.lang.IllegalArgumentException: FakeKeyedWorkItemCoder only works with KeyedWorkItemCoder or KvCoder; was: class org.apache.beam.sdk.coders.LengthPrefixCoder
    com.google.cloud.dataflow.worker.WindmillKeyedWorkItem$FakeKeyedWorkItemCoder.<init>(WindmillKeyedWorkItem.java:211)
    com.google.cloud.dataflow.sdk.util.TimerOrElement$TimerOrElementCoder.<init>(TimerOrElement.java:53)
    com.google.cloud.dataflow.sdk.util.TimerOrElement$TimerOrElementCoder.of(TimerOrElement.java:57)
    com.google.cloud.dataflow.sdk.util.TimerOrElement$TimerOrElementCloudObjectTranslator.fromCloudObject(TimerOrElement.java:85)
    com.google.cloud.dataflow.sdk.util.TimerOrElement$TimerOrElementCloudObjectTranslator.fromCloudObject(TimerOrElement.java:67)
    org.apache.beam.runners.dataflow.util.CloudObjects.coderFromCloudObject(CloudObjects.java:87)
    org.apache.beam.runners.dataflow.util.CloudObjectTranslators.getComponents(CloudObjectTranslators.java:71)
    org.apache.beam.runners.dataflow.util.CloudObjectTranslators.access$100(CloudObjectTranslators.java:51)
    org.apache.beam.runners.dataflow.util.CloudObjectTranslators$6.fromCloudObject(CloudObjectTranslators.java:248)
    org.apache.beam.runners.dataflow.util.CloudObjectTranslators$6.fromCloudObject(CloudObjectTranslators.java:237)
    org.apache.beam.runners.dataflow.util.CloudObjects.coderFromCloudObject(CloudObjects.java:87)
    com.google.cloud.dataflow.worker.BeamFnMapTaskExecutorFactory$5.typedApply(BeamFnMapTaskExecutorFactory.java:593)
    com.google.cloud.dataflow.worker.BeamFnMapTaskExecutorFactory$5.typedApply(BeamFnMapTaskExecutorFactory.java:587)
    com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
    com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
    com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
    com.google.cloud.dataflow.worker.BeamFnMapTaskExecutorFactory.create(BeamFnMapTaskExecutorFactory.java:136)
    com.google.cloud.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1143)
    com.google.cloud.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:136)
    com.google.cloud.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:966)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    java.lang.Thread.run(Thread.java:745)

上面的错误跟踪, 管道在我的本地计算机上工作正常,但由于上面的错误而在数据流运行程序上中断,因此我在任何地方都找不到任何相关信息。
详细信息:从Pubsub中读取-> 120秒窗口化->按键分组->插入到bigquery

1 个答案:

答案 0 :(得分:0)

错误是因为pcoll中的key,value来自不同的类。尽管我返回的是字符串,但它仍然给出了错误。 在与管道相同的类中添加一个函数,以在groupbykey之前接受输入,并输出相同的元素。