所以我有一堆sideoutputs,我想在一个管道中一起处理。这是元组标签的(简化)声明:
public static final TupleTag<String> TUPLE_TAG_1 = new TupleTag<String>() {};
public static final TupleTag<String> TUPLE_TAG_2 = new TupleTag<String>() {};
public static final TupleTag<String> TUPLE_TAG_3 = new TupleTag<String>() {};
我在管道中每次ParDo迭代时使用这些作为侧输出。
目前我将它们捆绑在一起作为一个PCollectionTuple:
PCollectionTuple metadataTags = PCollectionTuple.of(TUPLE_TAG_1, pCollectionTuple1.get(TUPLE_TAG_1))
.and(TUPLE_TAG_2, pCollectionTuple2.get(TUPLE_TAG_2))
.and(TUPLE_TAG_3, pCollectionTuple3.get(TUPLE_TAG_3));
一旦所有这些都被填充,我想将它们作为一个Map处理,由TupleTag(或其他东西)键入,并为它们编写以下(再次,简化的)ParDo函数:
public class MyDoFn extends DoFn<Map<String, String>, MyValue> {
@ProcessElement
public void processContext(ProcessContext processContext) {
Map<String, String> element = processContext.element();
MyValue myValue = new MyValue();
myValue.set("prop1", element.get(TUPLE_TAG_1));
myValue.set("prop2", element.get(TUPLE_TAG_2));
myValue.set("prop3", element.get(TUPLE_TAG_3));
processContext.output(myValue);
}
}
这不起作用,因为apply
函数会期望PTransform<PCollectionTuple, OutputT>
:
metadataTags.apply(ParDo.of(new MyDoFn()); //Does not compile
我该怎么做呢?