我在python中创建了一个spark作业,在那里我从Redshift中检索数据,然后我应用了很多转换,join,filter,withColumn,agg ... 数据框中有大约30K条记录 我执行所有转换,当我尝试编写AVRO文件时,火花作业失败
我的火花提交:
Dim Location As Integer = Panel_Images.Location.Y
If Location + 20 < Panel_Images.VerticalScroll.Maximum Then
Location += 20
Panel_Images.VerticalScroll.Value = Location
Else
'If scroll position is above 280 set the position to 280 (MAX)
Location = Panel_Images.VerticalScroll.Maximum
Panel_Images.AutoScrollPosition = New Point(0, Location)
End If
我正在使用--executor-memory 10G --driver-memory 14g,6台机器在亚马逊有8核和15G RAM,为什么我的内存出错了???
返回错误:
. /usr/bin/spark-submit --packages="com.databricks:spark-avro_2.11:3.2.0" --jars RedshiftJDBC42-1.2.1.1001.jar --deploy-mode client --master yarn --num-executors 10 --executor-cores 3 --executor-memory 10G --driver-memory 14g --conf spark.sql.broadcastTimeout=3600 --conf spark.network.timeout=10000000 --py-files dependencies.zip iface_extractions.py 2016-10-01 > output.log
这是火花日志的结束:
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 196608 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/hadoop/hs_err_pid13688.log