Google数据流 - Apache Beam StreamingOptions.streaming无效

时间:2018-05-30 05:53:04

标签: google-bigquery google-cloud-dataflow apache-beam

我们创建了一个Apache Beam管道,在直接管道上运行时可以成功运行。请参阅以下问题: -

BigQuery writeTableRows Always writing to buffer

当相同的代码被推送到Google DataFlow时,它将更改为Streaming并且不会批量处理。

                 rows.apply("Load", BigQueryIO.writeTableRows()
                            .to(table)
                            .withSchema(schema)
                            //.withTimePartitioning(timePartitioning)
                            .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
                            .withTriggeringFrequency(triggeringFrequency)
                            .withNumFileShards(numFileShards)
                            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

我们更改了管道的选项,如下所示: -

   DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);          
            options.setProject(projectId);
            options.setTempLocation(tempLocation);
            options.setStreaming(false);
            options.setRunner(DataflowRunner.class);

但是这里的setStreaming似乎在很大程度上被忽略了。

在Google数据流上使用批处理需要什么设置?

0 个答案:

没有答案