Apache beam PubSubIO写入超出的有效负载大小

时间:2017-10-30 15:50:14

标签: google-cloud-platform google-cloud-dataflow apache-beam

我正在尝试使用beam来重放进程以从BigQuery表中读取然后将数据写入PubSub主题,但我的数据流作业仍然因未捕获的异常而失败。从日志中我可以看到这个错误:

exception:  
"com.google.api.client.googleapis.json.GoogleJsonResponseException: 
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760 
bytes.",
"status" : "INVALID_ARGUMENT"
}

我正在使用Apache Beam 2.1.0的PubsubIO.writeMessages()和我使用xml有效负载创建的PubsubMessages的PCollection以及包含一个属性的属性映射。这些消息远低于10MB,并且在它们落入原始BQ表之前已经通过Pubsub发送。

示例:

PCollection<TableRow> input = pipeline.apply(BigQueryIO.read().fromQuery(queryBuilder.toString()).usingStandardSql());
        PCollection<PubsubMessage> replayMessages = input.apply("Create pubsub messages" ,ParDo.of(new DoFn<TableRow, PubsubMessage>() {
            private static final long serialVersionUID = -123;
            @DoFn.ProcessElement
            public void processElement(ProcessContext c) throws Exception {
                PubsubMessage pubsubMessage = new PubsubMessage(c.element().get("rawXml").toString().getBytes(),
                                    ImmutableMap.<String, String>of("myMessageId", c.element().get("myMessageId").toString()));
                c.output(pubsubMessage);
            }
        }));
        replayMessages.apply("Write messages to topic",PubsubIO.writeMessages().to(topicName));

这似乎仅适用于95条消息及以下。每当我尝试发布超过95时,我都会得到超出负载大小超出限制的错误。这是Beam PubsubIO的限制吗?

0 个答案:

没有答案