将PCollectionTuple作为ParDo函数中的映射处理

时间:2017-11-21 12:38:36

标签: java google-cloud-dataflow apache-beam

所以我有一堆sideoutputs,我想在一个管道中一起处理。这是元组标签的(简化)声明:

  public static final TupleTag<String> TUPLE_TAG_1 = new TupleTag<String>() {};
  public static final TupleTag<String> TUPLE_TAG_2 = new TupleTag<String>() {};
  public static final TupleTag<String> TUPLE_TAG_3 = new TupleTag<String>() {};

我在管道中每次ParDo迭代时使用这些作为侧输出。

目前我将它们捆绑在一起作为一个PCollectionTuple:

PCollectionTuple metadataTags = PCollectionTuple.of(TUPLE_TAG_1, pCollectionTuple1.get(TUPLE_TAG_1))
      .and(TUPLE_TAG_2, pCollectionTuple2.get(TUPLE_TAG_2))
      .and(TUPLE_TAG_3, pCollectionTuple3.get(TUPLE_TAG_3));

一旦所有这些都被填充,我想将它们作为一个Map处理,由TupleTag(或其他东西)键入,并为它们编写以下(再次,简化的)ParDo函数:

public class MyDoFn extends DoFn<Map<String, String>, MyValue> {

  @ProcessElement
  public void processContext(ProcessContext processContext) {
    Map<String, String> element = processContext.element();
    MyValue myValue = new MyValue();
    myValue.set("prop1", element.get(TUPLE_TAG_1));
    myValue.set("prop2", element.get(TUPLE_TAG_2));
    myValue.set("prop3", element.get(TUPLE_TAG_3));
    processContext.output(myValue);
  }

}

这不起作用,因为apply函数会期望PTransform<PCollectionTuple, OutputT>

metadataTags.apply(ParDo.of(new MyDoFn()); //Does not compile

我该怎么做呢?

0 个答案:

没有答案