Question

我的emr集群具有以下配置。

Data Nodes : 6
RAM per Node : 56 GB
Cores per Node: 32
Instance Type: M4*4xLarge

我在spark-sql下运行，并行执行5个hive脚本。

spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive1.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive2.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive3.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive4.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive5.hql

但纱线正在使用 270 GB 的内存。

根据给定命令中的参数，

每个saprk作业应该只使用120 GB RAM。

1 * 20 + 4 = 24 GB RAM

5个工作= 5 * 24 = 120 GB

但是，为什么纱线使用270 GB RAM？（群集中没有其他hadoop作业正在运行）

我是否需要包含任何额外参数来限制纱线资源利用率？

Answer 1

在spark-defaults.conf中将它设为“spark.dynamicAllocation.enabled”false（../../ spark / spark-x.x.x / conf / spark-defaults.conf）

这可以帮助您限制/避免动态分配资源。

Answer 2

即使我们在命令中设置了执行程序内存，但如果资源在群集中可用，spark会动态分配内存。要将内存使用限制为仅执行内存，spark动态内存分配参数应设置为false。

您可以直接在spark配置文件中更改它，或者将config参数传递给命令。

spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G --conf spark.dynamicAllocation.enabled=false -f hive1.hql

火花过度利用纱线资源

2 个答案: