Question

我们创建了一个Apache Beam管道，在直接管道上运行时可以成功运行。请参阅以下问题： -

BigQuery writeTableRows Always writing to buffer

当相同的代码被推送到Google DataFlow时，它将更改为Streaming并且不会批量处理。

                 rows.apply("Load", BigQueryIO.writeTableRows()
                            .to(table)
                            .withSchema(schema)
                            //.withTimePartitioning(timePartitioning)
                            .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
                            .withTriggeringFrequency(triggeringFrequency)
                            .withNumFileShards(numFileShards)
                            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

我们更改了管道的选项，如下所示： -

   DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);          
            options.setProject(projectId);
            options.setTempLocation(tempLocation);
            options.setStreaming(false);
            options.setRunner(DataflowRunner.class);

但是这里的setStreaming似乎在很大程度上被忽略了。

在Google数据流上使用批处理需要什么设置？

Google数据流 - Apache Beam StreamingOptions.streaming无效

0 个答案: