Question

我正在AWS EMR集群中运行pyspark作业，该集群的详细信息如下。一个主实例（m5.2xlarge）五个从属实例（m5.2xlarge-8 vCore，32 GiB内存，仅EBS存储EBS存储：200 GiB）。

我提交了pyspark作业后，失败并显示以下错误。

ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 24.1 GB of 24 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

下面是spark提交命令。

spark-submit  --deploy-mode cluster --master yarn --num-executors 2 --executor-cores 5 --executor-memory 21g --driver-memory 10g --conf spark.yarn.executor.memoryOverhead=3g --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.maxAppAttempts=100 --conf spark.executor.extraJavaOptions=-Xss3m  --conf spark.driver.maxResultSize=3g --conf spark.dynamicAllocation.enabled=false

请为执行者，执行者内存和内核不提供更好的参数。

Answer 1

您的执行程序JVM之一正在耗尽内存。如错误所示，请考虑将spark.yarn.executor.memoryOverhead从3g提高到合理值。

您还可以将--executor-memory增加到应用程序所需的更大值。

在此处查看火花特性： https://spark.apache.org/docs/2.4.0/running-on-yarn.html

Answer 2

我无法增加--executor-memory或spark.yarn.executor.memoryOverhead，因为它将达到最大阈值（24576 MB）。

将--num-executors增加到5后，问题已解决。

原因：容器因超出内存限制而被YARN杀死。 24.1 GB的24 GB物理内存

2 个答案: