为进行简单的概念验证,我试图在两分钟的窗口中显示点击数据。我要从那里开始要做的就是打印每个窗口的计数,以及窗口到BigQuery的边界。在运行管道时,我不断收到以下错误:
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"windowend","message":"This field is not a record.","reason":"invalid"}],"index":0}]
管道如下所示:
// Creating the pipeline
Pipeline p = Pipeline.create(options);
// Window items
PCollection<TableRow> counts = p.apply("ReadFromPubSub", PubsubIO.readStrings().fromTopic(options.getTopic()))
.apply("AddEventTimestamps", WithTimestamps.of(TotalCountPipeline::ExtractTimeStamp).withAllowedTimestampSkew(Duration.standardDays(10000)))
.apply("Window", Window.<String>into(
FixedWindows.of(Duration.standardHours(options.getWindowSize())))
.triggering(
AfterWatermark.pastEndOfWindow()
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.withAllowedLateness(Duration.standardDays(10000))
.accumulatingFiredPanes())
.apply("CalculateSum", Combine.globally(Count.<String>combineFn()).withoutDefaults())
.apply("BigQueryFormat", ParDo.of(new FormatCountsFn()));
// Writing to BigQuery
counts.apply("WriteToBigQuery",BigQueryIO.writeTableRows()
.to(options.getOutputTable())
.withSchema(getSchema())
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
// Execute pipeline
p.run().waitUntilFinish();
我猜想它与BigQuery格式化功能有关,该功能实现如下:
static class FormatCountsFn extends DoFn<Long, TableRow> {
@ProcessElement
public void processElement(ProcessContext c, BoundedWindow window) {
TableRow row =
new TableRow()
.set("windowStart", window.maxTimestamp().toDateTime())
.set("count", c.element().intValue());
c.output(row);
}
}
受this post的启发。谁能对此有所启发?似乎无法绕开它。
答案 0 :(得分:2)
显然,此问题的答案与光束窗口无关,并且仅与BigQuery有关。将DateTime对象写入BigQuery行需要使用正确的yyyy-MM-dd HH:mm:ss格式的字符串,这与我提供的DateTime对象相反。