Question

我在python中创建了一个spark作业，在那里我从Redshift中检索数据，然后我应用了很多转换，join，filter，withColumn，agg ... 数据框中有大约30K条记录我执行所有转换，当我尝试编写AVRO文件时，火花作业失败

我的火花提交：

Dim Location As Integer = Panel_Images.Location.Y

If Location + 20 < Panel_Images.VerticalScroll.Maximum Then

    Location += 20
    Panel_Images.VerticalScroll.Value = Location

Else

    'If scroll position is above 280 set the position to 280 (MAX)
    Location = Panel_Images.VerticalScroll.Maximum
    Panel_Images.AutoScrollPosition = New Point(0, Location)
End If

我正在使用--executor-memory 10G --driver-memory 14g，6台机器在亚马逊有8核和15G RAM，为什么我的内存出错了???

返回错误：

. /usr/bin/spark-submit --packages="com.databricks:spark-avro_2.11:3.2.0" --jars RedshiftJDBC42-1.2.1.1001.jar --deploy-mode client --master yarn --num-executors 10 --executor-cores 3 --executor-memory 10G --driver-memory 14g --conf spark.sql.broadcastTimeout=3600 --conf spark.network.timeout=10000000 --py-files dependencies.zip iface_extractions.py 2016-10-01 > output.log

这是火花日志的结束：

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 196608 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/hadoop/hs_err_pid13688.log

Spark内存错误Java运行时环境

0 个答案: