pyspark作业在AWS EMR上失败并显示OutOfMemoryError

时间:2020-10-08 07:13:07

标签: python amazon-web-services pyspark out-of-memory amazon-emr

我已经提交了pyspark作业,但是花了一些时间后作业失败,并出现以下错误:

20/10/08 06:49:30 ERROR Client: Application diagnostics message: Application application_1602138886042_0001 failed 2 times due to AM Container for appattempt_1602138886042_0001_000002 exited with  exitCode: -104
Failing this attempt.Diagnostics: Container [pid=16756,containerID=container_1602138886042_0001_02_000001] is running beyond physical memory limits. Current usage: 1.6 GB of 1.5 GB physical memory used; 4.4 GB of 7.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1602138886042_0001_02_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 16756 16754 16756 16756 (bash) 0 0 115871744 704 /bin/bash -c LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1602138886042_0001/container_1602138886042_0001_02_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -

要解决此内存问题,我厌倦了更改驱动程序和执行程序的内存设置,但工作仍然失败。下面是spark提交命令:

Args': ['spark-submit',
                         '--deploy-mode', 'cluster',
                         '--master', 'yarn',
                         '--executor-memory',
                         conf['emr_step_executor_memory'],
                         '--executor-cores',
                         conf['emr_step_executor_cores'],
                         
                         '--conf',
                         'spark.yarn.submit.waitAppCompletion=true',
                         '--conf',
                         'spark.rpc.message.maxSize=1024',
                         '--conf',
                         'spark.driver.memoryOverhead=512',
                         '--conf',
                         'spark.executor.memoryOverhead=512',
                         '--conf',
                         'spark.driver.memory =2g',
                         '--conf',
                         'spark.driver.cores=2']

aws上的主机:c4.2xlarge AWS上的核心机器:c4.4xlarge

一件重要的事情数据即使少于50 MB也并不多。

0 个答案:

没有答案