DatastoreException:非事务提交可能不包含影响同一实体的多个突变

时间:2018-07-24 06:24:48

标签: google-cloud-datastore google-cloud-dataflow apache-beam google-cloud-pubsub

我有一个Apache BEAM管道,用于处理来自Google pubsub主题的流数据并写入Google数据存储区。过去几天出现了错误,并显示以下错误消息,并阻塞了管道,导致我们丢失了数据。

com.google.datastore.v1.client.DatastoreException: A non-transactional commit may not contain multiple mutations affecting the same entity., code=INVALID_ARGUMENT
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126)
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:169)
at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:89)
at com.google.datastore.v1.client.Datastore.commit(Datastore.java:84)
at org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:1326)
at org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:1291)

管道处于流模式,不以任何方式批处理或窗口化数据。因此,我不确定管道同时写入重复记录的可能性。想检查一下:

  1. 在流式传输环境中写入Datastore之前,是否可以合并重复检查?
  2. 如何将日志附加到异常,以便可以从哪些源生成异常数据并从那里进行故障排除?

BEAM管道代码如下:

public class JobPipeline {
    private final Pipeline pipeline;
    private final JobOptions options;

    JobPipeline(JobOptions options) {
        this.options = options;
        this.pipeline = Pipeline.create(options);
    }

    void run() throws IOException {

        PTransform<PBegin, PCollection<String>> input = getInput([pubsub topic]);

        PCollection<KV<String, EnrichedData>> enrichedData = new EnrichmentPipeline(options, input).apply(pipeline);

        pipeline.run();
    }
}

public class EnrichmentPipeline {

    private final JobOptions options;
    private final PTransform<PBegin, PCollection<String>> input;

    public EnrichmentPipeline(JobOptions options,
                          PTransform<PBegin, PCollection<String>> input) {
        this.options = options;
        this.input = input;
    }

    public PCollection<KV<String, EnrichedData>> apply(final Pipeline pipeline) throws IOException {

        PCollection<KV<String, EnrichedData>> enrichedData = pipeline.apply("Reading Data", input)
                                                                 .apply("Transforming Json to Data", ParDo.of(new JsonToData()))
                                                                 .apply("Enrichment", ParDo.of(new EnrichmentFn(options.getProjectId(), options.getReferenceKind())));

        writeIntoDataStore(options.getProjectId(), enrichedData, new EnrichedDataToEntityFn(options.getDataKind()));

        return enrichedData;
    }
}

0 个答案:

没有答案