Google云数据流:在连续8个周期的测量到的GC抖动之后关闭JVM

时间:2019-04-03 16:25:52

标签: google-cloud-dataflow

我正在使用Google Cloud Dataflow进行一些转换

am从GBQ收集了大约300万条记录,然后执行转换并将转换结果写入GCS。

执行此操作时,数据流因错误而失败 错误: 在连续8次测量到的GC抖动之后,关闭JVM

工作流程失败。原因:S20:读取GBQ /重新排列.ViaRandomKey /重新排列/ GroupByKey /读取+读取GBQ /重新排列。值/映射+读取GBQ /读取文件+读取GBQ / PassThroughThenCleanup / ParMultiDo(Identity)+读取GBQ / PassThroughThenCleanup / View.AsIterable / ParDo(ToIsmRecordForGlobalWindow)+ transform + Split结果/ ParMultiDo(Partition)+写入错误/ WriteFiles / Rewind Window.Assign +写入错误/ WriteFiles / WriteShardedBundlesToTempFiles / ApplyShardingKey +写入错误/ WriteFiles / WriteShardedBundlesToTempFiles / GroupIntoShards / Reify + Write错误/ WriteFiles / WriteShardedBundlesToTempFiles / WriteTriteFiles / WriteTriteFiles / WriteTriteFiles / WriteTriteFiles / WriteTriteFiles / WriteTriteFiles / WriteTriteFiles / WriteWriteS / / WriteShardedBundlesToTempFiles / GroupIntoShards / Reify + Write实体Gzip / WriteFiles / WriteShardedBundlesToTempFiles / GroupIntoShards / Write失败。工作项目是尝试了4次,但没有成功。每次工人最终失去与服务的联系。在以下项目上尝试了该工作项:

DataConverterOptions选项= PipelineOptionsFactory.fromArgs(args).withValidation()                 .as(DataConverterOptions.class);         管道p = Pipeline.create(options);

    EntityCreatorFn entityCreatorFn = EntityCreatorFn.newWithGCSMapping(options.getMapping(),
            options.getWithUri(), options.getLineNumberToResult(), options.getIsPartialUpdate(), options.getQuery() != null);
    PCollectionList<String> resultByType =
            p.apply("Read GBQ", BigQueryIO.read(
                    (SchemaAndRecord elem) -> elem.getRecord().get("lineNumber") + "|" + elem.getRecord().get("sourceData"))
                    .fromQuery(options.getQuery()).withoutValidation()
                    .withCoder(StringUtf8Coder.of()).withTemplateCompatibility()).apply("transform",ParDo.of(entityCreatorFn))
                    .apply("Split results",Partition.of(2, (Partition.PartitionFn<String>) (elem, numPartitions) -> {
                        if (elem.startsWith(PREFIX_ERROR)) {
                            return PARTITION_ERROR;
                        }
                        return PARTITION_SUCCESS;
                    }));
    FileIO.Sink sink = TextIO.sink();
    resultByType.get(0).apply("Write entities Gzip", FileIO.write().to(options.getOutput()).withCompression(Compression.GZIP).withNumShards(options.getShards()).via(sink));
    resultByType.get(1).apply("Write errors", TextIO.write().to(options.getErrorOutput()).withoutSharding());
    p.run();

在连续8次测量到的GC抖动之后,关闭JVM。内存已使用/总/最大= 109/301/2507 MB,GC最后/最大= 54.00 / 54.00%,#pushbacks = 0,gc thrashing = true。

1 个答案:

答案 0 :(得分:0)

'EntityCreatorFn.newWithGCSMapping'是否在内存中缓存元素?似乎管道中的步骤之一消耗了太多内存(请注意,Dataflow无法并行处理DoFn的单个元素的处理)。我建议调整您的管道或试用highmem机器。如果问题仍然存在,请考虑与Google Cloud Support联系并提供相关的工作ID等。