如何在pyspark中为小作业正确配置内存使用情况?

时间:2017-04-19 19:26:44

标签: apache-spark pyspark hdfs yarn

pyspark上运行YARN时,我的OOM工作量相当小。

这项工作得到了火花和纱线的吸引力,但随后在失败时出现了OOM Exit status: 52

ERROR cluster.YarnScheduler: Lost executor 1 on <ip>: Container marked as failed: containerID 2 on host: <ip>. Exit status: 52. Diagnostics: Exception from container-launch.

当我检查此应用程序的纱线日志文件时,我看到以下内容:

    date 19:03:19 INFO hadoop.ColumnChunkPageWriteStore: written 10,852B for [user] BINARY: 11,052 values, 11,764B raw, 10,786B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY, RLE], dic { 3,607 entries, 78,442B raw, 3,607B comp}
date 19:03:20 INFO mapred.SparkHadoopMapRedUtil: attempt_date1903_0006_m_000013_0: Committed
date 19:03:20 INFO executor.Executor: Finished task 13.0 in stage 6.0 (TID 1058). 2077 bytes result sent to driver
date time:07 INFO executor.Executor: Executor is trying to kill task 70.0 in stage 6.0 (TID 1115)
date time:07 INFO executor.Executor: Executor is trying to kill task 127.0 in stage 6.0 (TID 1170)
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/daemon.py", line 157, in manager
    code = worker(sock)
  File "/usr/lib/spark/python/pyspark/daemon.py", line 61, in worker
    worker_main(infile, outfile)
  File "/usr/lib/spark/python/pyspark/worker.py", line 136, in main
    if read_int(infile) == SpecialLengths.END_OF_STREAM:
  File "/usr/lib/spark/python/pyspark/serializers.py", line 545, in read_int
    raise EOFError
EOFError
date time:07 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown

我不确定是什么导致了这个问题,因为它似乎是文件有问题,但火花驱动程序内存为8GB。

机器看起来像这样:

21 nodes, 64GB each, 8 cores each

spark-defaults.conf是:

spark.executor.memory=30928mb
spark.driver.memory 8g
spark.executor.instances=30
spark.executor.cores = 7
spark.yarn.executor.memoryOverhead = 19647

yarn-site.xml是:

yarn.scheduler.maximum-allocation-vcores = 1024
yarn.scheduler.maximum-allocation-mb = 61430
yarn.nodemanager.resource.memory-mb = 61430
yarn.nodemanager.resource.cpu-vcores = 7 (left one for the driver)

我在这里配置错误吗?

此外,火花UI显示:

URL: spark://ip:7077
REST URL: spark://ip:6066
Alive Workers: 30
Cores in use: 240 Total, 0 Used
Memory in use: 1769.7 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

容器需要更多内存的原因是什么?我不确定增加还是减少记忆?

0 个答案:

没有答案