我想在PCollection分区中传递一个sideInput,在此基础上,我需要将我的PCollection除以它们无论如何....
PCollectionList<TableRow> part = merged.apply(Partition.of(Pcollection Count Function Called, new PartitionFn<TableRow>(){
@Override
public int partitionFor(TableRow arg0, int arg1) {
return 0;
}
}));
我可以通过其他方式分区我的PCollection
//没有动态目标分区BigQuery表
merge.apply("write into target", BigQueryIO.writeTableRows()
.to(new SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>() {
@Override
public TableDestination apply(ValueInSingleWindow<TableRow> value) {
TableRow row = value.getValue();
TableReference reference = new TableReference();
reference.setProjectId("XYZ");
reference.setDatasetId("ABC");
System.out.println("date of row " + row.get("authorized_transaction_date_yyyymmdd").toString());
LOG.info("date of row "+
row.get("authorized_transaction_date_yyyymmdd").toString());
String str = row.get("authorized_transaction_date_yyyymmdd").toString();
str = str.substring(0, str.length() - 2) + "01";
System.out.println("str value " + str);
LOG.info("str value " + str);
reference.setTableId("TargetTable$" + str);
return new TableDestination(reference, null);
}
}).withFormatFunction(new SerializableFunction<TableRow, TableRow>() {
@Override
public TableRow apply(TableRow input) {
LOG.info("format function:"+input.toString());
return input;
}
})
.withSchema(schema1).withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
现在我必须使用Dynamic Destination Any Solution.Insted Of this并且必须进行分区。
答案 0 :(得分:1)
基于在代码中看到TableRow
,我怀疑你想为BigQuery写一个PCollection
,向不同的BigQuery表发送不同的元素。 BigQueryIO.write()
已使用BigQueryIO.write().to(DynamicDestinations)
提供了一种方法。请参阅Writing different values to different BigQuery tables in Apache Beam。