Question

我有我想要推送到Google BigQuery的日志。我正在尝试使用谷歌数据流构建整个管道。日志结构不同，可分为四种不同类型。在我的管道中，我从PubSub读取日志解析它并写入BigQuery表。日志需要写入的表取决于日志中的一个参数。问题是我遇到了如何在运行时更改BigQueryIO.Write的TableName的问题。

Answer 1

您可以使用侧面输出。

https://cloud.google.com/dataflow/model/par-do#emitting-to-side-outputs-in-your-dofn

以下示例代码，读取BigQuery表并将其拆分为3个不同的PCollections。每个PCollections最终都会发送到不同的Pub / Sub主题（可能是不同的BigQuery表）。

Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());

PCollection<TableRow> weatherData = p.apply(
        BigQueryIO.Read.named("ReadWeatherStations").from("clouddataflow-readonly:samples.weather_stations"));

final TupleTag<String> readings2010 = new TupleTag<String>() {
};
final TupleTag<String> readings2000plus = new TupleTag<String>() {
};
final TupleTag<String> readingsOld = new TupleTag<String>() {
};

PCollectionTuple collectionTuple = weatherData.apply(ParDo.named("tablerow2string")
        .withOutputTags(readings2010, TupleTagList.of(readings2000plus).and(readingsOld))
        .of(new DoFn<TableRow, String>() {
            @Override
            public void processElement(DoFn<TableRow, String>.ProcessContext c) throws Exception {

                if (c.element().getF().get(2).getV().equals("2010")) {
                    c.output(c.element().toString());
                } else if (Integer.parseInt(c.element().getF().get(2).getV().toString()) > 2000) {
                    c.sideOutput(readings2000plus, c.element().toString());
                } else {
                    c.sideOutput(readingsOld, c.element().toString());
                }

            }
        }));
collectionTuple.get(readings2010)
        .apply(PubsubIO.Write.named("WriteToPubsub1").topic("projects/fh-dataflow/topics/bq2pubsub-topic1"));
collectionTuple.get(readings2000plus)
        .apply(PubsubIO.Write.named("WriteToPubsub2").topic("projects/fh-dataflow/topics/bq2pubsub-topic2"));
collectionTuple.get(readingsOld)
        .apply(PubsubIO.Write.named("WriteToPubsub3").topic("projects/fh-dataflow/topics/bq2pubsub-topic3"));

p.run();

Google数据流根据输入

1 个答案: