我正在通过pub-sub接收消息,并希望使用消息数据上传到大查询,以确定将数据上传到哪个表。
我尝试了以下操作:
Pipeline pipeline = Pipeline.create(options); String bigQueryTable;
PCollection<String> input = pipeline
.apply(PubsubIO.Read.subscription("projects/my-data-analysis/subscriptions/myDataflowSub"));
input.apply(ParDo.of(new DoFn<String, TableRow>() {
@Override
public void processElement(DoFn<String, TableRow>.ProcessContext c) throws Exception {
JSONObject firstJSONObject = new JSONObject(c.element());
bigQueryTable = firstJSONObject.get("tableName").toString();
TableRow tableRow = convertJsonToTableRow(firstJSONObject);
c.output(tableRow);
}
})).apply(BigQueryIO.Write.to("my-data-analysis:mydataset." + bigQueryTable).withSchema(tableSchema));
有没有办法在不编写我自己的DOFN的情况下这样做?
如果我确实需要实现自己的doFn,如何实现它上传到big-query?
答案 0 :(得分:1)
目前这不是直接可行的,但是有一些解决方案可以解决一些潜在的用例。查看相关问题:
Dynamic table name when writing to BQ from dataflow pipelines
Specifying dynamically generated table name based on line contents