Question

我有一个Apache BEAM管道，用于处理来自Google pubsub主题的流数据并写入Google数据存储区。过去几天出现了错误，并显示以下错误消息，并阻塞了管道，导致我们丢失了数据。

com.google.datastore.v1.client.DatastoreException: A non-transactional commit may not contain multiple mutations affecting the same entity., code=INVALID_ARGUMENT
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126)
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:169)
at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:89)
at com.google.datastore.v1.client.Datastore.commit(Datastore.java:84)
at org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:1326)
at org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:1291)

管道处于流模式，不以任何方式批处理或窗口化数据。因此，我不确定管道同时写入重复记录的可能性。想检查一下：

在流式传输环境中写入Datastore之前，是否可以合并重复检查？
如何将日志附加到异常，以便可以从哪些源生成异常数据并从那里进行故障排除？

BEAM管道代码如下：

public class JobPipeline {
    private final Pipeline pipeline;
    private final JobOptions options;

    JobPipeline(JobOptions options) {
        this.options = options;
        this.pipeline = Pipeline.create(options);
    }

    void run() throws IOException {

        PTransform<PBegin, PCollection<String>> input = getInput([pubsub topic]);

        PCollection<KV<String, EnrichedData>> enrichedData = new EnrichmentPipeline(options, input).apply(pipeline);

        pipeline.run();
    }
}

public class EnrichmentPipeline {

    private final JobOptions options;
    private final PTransform<PBegin, PCollection<String>> input;

    public EnrichmentPipeline(JobOptions options,
                          PTransform<PBegin, PCollection<String>> input) {
        this.options = options;
        this.input = input;
    }

    public PCollection<KV<String, EnrichedData>> apply(final Pipeline pipeline) throws IOException {

        PCollection<KV<String, EnrichedData>> enrichedData = pipeline.apply("Reading Data", input)
                                                                 .apply("Transforming Json to Data", ParDo.of(new JsonToData()))
                                                                 .apply("Enrichment", ParDo.of(new EnrichmentFn(options.getProjectId(), options.getReferenceKind())));

        writeIntoDataStore(options.getProjectId(), enrichedData, new EnrichedDataToEntityFn(options.getDataKind()));

        return enrichedData;
    }
}

DatastoreException：非事务提交可能不包含影响同一实体的多个突变

0 个答案: