我们创建了一个Apache Beam管道,在直接管道上运行时可以成功运行。请参阅以下问题: -
BigQuery writeTableRows Always writing to buffer
当相同的代码被推送到Google DataFlow时,它将更改为Streaming并且不会批量处理。
rows.apply("Load", BigQueryIO.writeTableRows()
.to(table)
.withSchema(schema)
//.withTimePartitioning(timePartitioning)
.withMethod(BigQueryIO.Write.Method.FILE_LOADS)
.withTriggeringFrequency(triggeringFrequency)
.withNumFileShards(numFileShards)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
我们更改了管道的选项,如下所示: -
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setProject(projectId);
options.setTempLocation(tempLocation);
options.setStreaming(false);
options.setRunner(DataflowRunner.class);
但是这里的setStreaming似乎在很大程度上被忽略了。
在Google数据流上使用批处理需要什么设置?