没有为GroupByKey.GroupByKeyOnly注册的翻译

时间:2016-03-23 22:48:36

标签: google-cloud-dataflow

尝试使用数据流执行GoGroupByKey时遇到此错误。在高级别,我想加入两个类型为KV<String, self-defined-class>的PCollection和另一个类型为KV<String, TableRow>的PCollection。我只是通过TupleTags,KeyedPCollection和CoGroupByKey进行标准连接,与official document中列出的示例非常相似

    PCollection<KV<String, TableRow>> pt1 = ...;
    PCollection<KV<String, MyClass>> pt2 = ...;
    final TupleTag<TableRow> t1 = new TupleTag<>();
    final TupleTag<MyClass> t2 = new TupleTag<>();
    PCollection<KV<String, CoGbkResult>> coGbkResultCollection =
    KeyedPCollectionTuple.of(t1, pt1)
                     .and(t2, pt2)
                     .apply(CoGroupByKey.<String>create());

对于大多数人来说,我有点困惑它的意思(搜索一下,发现它抱怨没有工作将数据流“服务”查询“翻译”到工作但仍然没有知道它在技术上意味着什么)以及它可能指示的内容(特别是当它发生在GroupByKeyOnly时),我可以把它作为提示来调试我的代码片段。

整个堆栈跟踪如下:

    Exception in thread "main" java.lang.IllegalStateException: no translator registered for GroupByKey.GroupByKeyOnly
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator$Translator.visitTransform(DataflowPipelineTranslator.java:500)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:219)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:102)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:259)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:455)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:146)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:325)
at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.run(BlockingDataflowPipelineRunner.java:95)

仅供参考我正在使用带有BlockingDataflowPipelineRunner

的java库 编辑,我发现源代码已经发现是因为DataflowPipelineTranslator.java没有在DataflowPipelineRunner中注册Tranformer GroupByKeyOnly,所以在DataflowPipelineOptions上运行的任何管道(及其任何扩展)都会有GroupByKeyOnly注册了......?

1 个答案:

答案 0 :(得分:1)

GroupByKeyOnly应该从未出现在应用于DataflowPipelineRunner图形的变换集中,这可能发生,因为管道可能已经构建,而没有在PipelineOptions上设置运行器,然后调用[Blocking] DataflowPipelineRunner.run(管道)。预期的模式是不直接使用DataflowPipeline / DataflowPipelineRunner方法,例如:

PipelineOptions options = PipelineOptionsFactory.fromArgs(args);

// Make sure that runner is set before calling Pipeline.create(options)
Pipeline p = Pipeline.create(options);

// Apply all your transforms
p.apply(... transforms ...);

PipelineResult result = p.run();

通过上面的示例,您可以通过调整应用程序的命令行参数来交换运行程序。例如,使用BlockingDataflowPipelineRunner将确保作业结果在从p.run()返回之前已达到终止状态。