如何调试Beam WriteToText内存不足?

时间:2019-03-28 12:31:09

标签: python google-cloud-dataflow apache-beam

在WriteToText操作期间,GroupBy由于内存不足而失败,这杀死了我的数据流作业。在本地运行作业,我也用光了内存。

基于WriteToText source code,在我看来,指定分片数量应该有助于解决问题。我不确定如何选择分片数量,尽管有人可以解释选择分片数量的过程吗?

我希望有更好的分片方法可能意味着管道效率较低,但不会崩溃。总的来说,我不确定如何使数据流管道更健壮,以防止大型异常值导致的故障。

有关上下文的更多信息,Dataflow上的错误消息如下所示:


Workflow failed. Causes: S31:ReadData/Read+BaseNLP+SplitBaseDoc+WriteJSONBaseNLPToGS/Write/WriteImpl/WriteBundles/WriteBundles+SplitSentences+NormalisedNESplitSentences+NamedEntitiesSplit+LinkedEntitiesSplit+ExtractMetadata+ExtractSentCoOcc+ExtractDocCoOcc+WriteJSONDocumentToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteDocCoOccToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteJSONDocumentToGS/Write/WriteImpl/Pair+WriteNamedEntitiesToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteNormalisedSentenceNEToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteNormalisedSentenceNEToGS/Write/WriteImpl/Pair+WriteNormalisedSentenceNEToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteNormalisedSentenceNEToGS/Write/WriteImpl/GroupByKey/Reify+WriteNormalisedSentenceNEToGS/Write/WriteImpl/GroupByKey/Write+WriteJSONDocToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteJSONDocToGS/Write/WriteImpl/Pair+WriteJSONDocToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteJSONDocToGS/Write/WriteImpl/GroupByKey/Reify+WriteJSONDocToGS/Write/WriteImpl/GroupByKey/Write+WriteDocCoOccToGS/Write/WriteImpl/Pair+WriteSentCoOccToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteSentCoOccToGS/Write/WriteImpl/Pair+WriteSentCoOccToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteSentCoOccToGS/Write/WriteImpl/GroupByKey/Reify+WriteSentCoOccToGS/Write/WriteImpl/GroupByKey/Write+WriteSentenceToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteSentenceToGS/Write/WriteImpl/Pair+WriteSentenceToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteSentenceToGS/Write/WriteImpl/GroupByKey/Reify+WriteSentenceToGS/Write/WriteImpl/GroupByKey/Write+WriteMetadataToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteMetadataToGS/Write/WriteImpl/Pair+WriteMetadataToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteMetadataToGS/Write/WriteImpl/GroupByKey/Reify+WriteMetadataToGS/Write/WriteImpl/GroupByKey/Write+WriteDocCoOccToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteDocCoOccToGS/Write/WriteImpl/GroupByKey/Reify+WriteDocCoOccToGS/Write/WriteImpl/GroupByKey/Write+WriteLinkedEntitiesToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteLinkedEntitiesToGS/Write/WriteImpl/Pair+WriteLinkedEntitiesToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteLinkedEntitiesToGS/Write/WriteImpl/GroupByKey/Reify+WriteLinkedEntitiesToGS/Write/WriteImpl/GroupByKey/Write+WriteJSONDocumentToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteJSONDocumentToGS/Write/WriteImpl/GroupByKey/Reify+WriteJSONDocumentToGS/Write/WriteImpl/GroupByKey/Write+WriteJSONBaseNLPToGS/Write/WriteImpl/Pair+WriteJSONBaseNLPToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteJSONBaseNLPToGS/Write/WriteImpl/GroupByKey/Reify+WriteJSONBaseNLPToGS/Write/WriteImpl/GroupByKey/Write+WriteNamedEntitiesToGS/Write/WriteImpl/Pair+WriteNamedEntitiesToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteNamedEntitiesToGS/Write/WriteImpl/GroupByKey/Reify+WriteNamedEntitiesToGS/Write/WriteImpl/GroupByKey/Write failed., A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on: 
  datachunk7-03250851-9xfb-harness-jgn7,
  datachunk7-03250851-9xfb-harness-jgn7,
  datachunk7-03250851-9xfb-harness-3hl5,
  datachunk7-03250851-9xfb-harness-g6m5

因此,这表明WriteNamedEntitiesToGS失败了,它具有以下定义:

named_entities | 'WriteNamedEntitiesToGS' >> WriteToText(known_args.output, file_name_suffix='_named_ent.json.gz')

所以这表明问题出在上面链接的WriteToText。

0 个答案:

没有答案