cluster-mode SPARK拒绝同时运行两个以上的作业

时间:2017-08-29 19:24:30

标签: hadoop apache-spark

我的Spark群集拒绝同时运行两个以上的作业。其中一个将不变留在“接受”状态。状态。

硬件

4 Data Node with spark clients, 24gb ram, 4processors

群集指标显示应该有足够的核心

Apps Submitted    3
Apps Pending    1
Apps Running    2
Apps Completed    0
Containers Running   4
Memory Used    8GB
Memory Total  32GB
Memory Reserved  0B
VCores Used    4
VCores Total    8
VCores Reserved    0
Active Nodes    2
Decommissioned Nodes    0
Lost Nodes    0
Unhealthy Nodes   0
Rebooted Nodes    0

在应用程序管理器上,您可以看到最终运行第三个应用程序的唯一方法是杀死正在运行的应用程序

application_1504018580976_0002 adm com.x.app1 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 5120 25.0 25.0 
application_1500031233020_0090 adm com.x.app2 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 3072 25.0 25.0 
application_1504024737012_0001 adm com.x.app3 SPARK default 0 [date] N/A ACCEPTED UNDEFINED 0 0 0 0.0 0.0

正在运行的应用程序有2个容器和2个分配的vcores,25%的队列和25%的群集。

所有3个应用的部署命令。

/usr/hdp/current/spark2-client/bin/spark-submit 
--master yarn 
--deploy-mode cluster 
--driver-cores 1 
--driver-memory 512m 
--num-executors 1 
--executor-cores 1 
--executor-memory 1G 
--class com..x.appx ../lib/foo.jar

容量调度程序

yarn.scheduler.capacity.default.minimum-user-limit-percent = 100
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
yarn.scheduler.capacity.maximum-applications = 10000
yarn.scheduler.capacity.node-locality-delay = 40
yarn.scheduler.capacity.root.accessible-node-labels = *
yarn.scheduler.capacity.root.acl_administer_queue = *
yarn.scheduler.capacity.root.capacity = 100
yarn.scheduler.capacity.root.default.acl_administer_jobs = *
yarn.scheduler.capacity.root.default.acl_submit_applications = *
yarn.scheduler.capacity.root.default.capacity = 100
yarn.scheduler.capacity.root.default.maximum-capacity = 100
yarn.scheduler.capacity.root.default.state = RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor = 1
yarn.scheduler.capacity.root.queues = default

2 个答案:

答案 0 :(得分:1)

您的设置:

yarn.scheduler.capacity.maximum-am-resource-percent = 0.2

的含义是:

total vcores(8) x maximum-am-resource-percent(0.2) = 1.6
由于部分vcores毫无意义,因此1.6升至2。这意味着您一次只能拥有2个应用程序主服务器,这就是您一次只能运行2个作业的原因。

解决方案,将yarn.scheduler.capacity.maximum-am-resource-percent提升到更高的值,例如0.5。

答案 1 :(得分:0)

以下是控制并行执行的参数:

.def

https://spark.apache.org/docs/latest/submitting-applications.html