我正在尝试使用beam来重放进程以从BigQuery表中读取然后将数据写入PubSub主题,但我的数据流作业仍然因未捕获的异常而失败。从日志中我可以看到这个错误:
exception:
"com.google.api.client.googleapis.json.GoogleJsonResponseException:
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760
bytes.",
"status" : "INVALID_ARGUMENT"
}
我正在使用Apache Beam 2.1.0的PubsubIO.writeMessages()和我使用xml有效负载创建的PubsubMessages的PCollection以及包含一个属性的属性映射。这些消息远低于10MB,并且在它们落入原始BQ表之前已经通过Pubsub发送。
示例:
PCollection<TableRow> input = pipeline.apply(BigQueryIO.read().fromQuery(queryBuilder.toString()).usingStandardSql());
PCollection<PubsubMessage> replayMessages = input.apply("Create pubsub messages" ,ParDo.of(new DoFn<TableRow, PubsubMessage>() {
private static final long serialVersionUID = -123;
@DoFn.ProcessElement
public void processElement(ProcessContext c) throws Exception {
PubsubMessage pubsubMessage = new PubsubMessage(c.element().get("rawXml").toString().getBytes(),
ImmutableMap.<String, String>of("myMessageId", c.element().get("myMessageId").toString()));
c.output(pubsubMessage);
}
}));
replayMessages.apply("Write messages to topic",PubsubIO.writeMessages().to(topicName));
这似乎仅适用于95条消息及以下。每当我尝试发布超过95时,我都会得到超出负载大小超出限制的错误。这是Beam PubsubIO的限制吗?