当使用DataFlowRunner
在本地运行时,我有一个Dataflow作业运行良好,但是当我尝试使用GCP的Composer / AirFlow运行它时,会给我一个错误:
Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for ConvertToYouTubeMetadata/ParDo(convertToTableRow$1)/ParMultiDo(convertToTableRow$1).output [PCollection]. Correct one of the following root causes:
No Coder has been manually specified; you may do so using .setCoder().
Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder for com.google.api.services.bigquery.model.TableRow.
Building a Coder using a registered CoderProvider failed.
See suppressed exceptions for detailed failures.
Using the default output Coder from the producing PTransform failed: PTransform.getOutputCoder called.
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:259)
at org.apache.beam.sdk.values.PCollection.finishSpecifying(PCollection.java:107)
at org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:190)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:536)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at MainKt.runMetadataPipeline(main.kt:66)
at MainKt.main(main.kt:34)
Composer上的执行方式有何不同,会导致它在本地运行时无法正常工作?
我只是在使用
BigQueryIO.writeTableRows()
答案 0 :(得分:1)
使用ShadowJar
构建我的JAR为我解决了这个问题。当Dataflow使用DataFlowJavaOperator
执行JAR文件时,我认为打包JAR时出现问题。我不记得在Github上读过什么,但是有人提到使用Maven Shade插件来解决此问题,这与Gradle等效。