我的Spark群集拒绝同时运行两个以上的作业。其中一个将不变留在“接受”状态。状态。
4 Data Node with spark clients, 24gb ram, 4processors
Apps Submitted 3
Apps Pending 1
Apps Running 2
Apps Completed 0
Containers Running 4
Memory Used 8GB
Memory Total 32GB
Memory Reserved 0B
VCores Used 4
VCores Total 8
VCores Reserved 0
Active Nodes 2
Decommissioned Nodes 0
Lost Nodes 0
Unhealthy Nodes 0
Rebooted Nodes 0
application_1504018580976_0002 adm com.x.app1 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 5120 25.0 25.0
application_1500031233020_0090 adm com.x.app2 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 3072 25.0 25.0
application_1504024737012_0001 adm com.x.app3 SPARK default 0 [date] N/A ACCEPTED UNDEFINED 0 0 0 0.0 0.0
正在运行的应用程序有2个容器和2个分配的vcores,25%的队列和25%的群集。
/usr/hdp/current/spark2-client/bin/spark-submit
--master yarn
--deploy-mode cluster
--driver-cores 1
--driver-memory 512m
--num-executors 1
--executor-cores 1
--executor-memory 1G
--class com..x.appx ../lib/foo.jar
yarn.scheduler.capacity.default.minimum-user-limit-percent = 100
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
yarn.scheduler.capacity.maximum-applications = 10000
yarn.scheduler.capacity.node-locality-delay = 40
yarn.scheduler.capacity.root.accessible-node-labels = *
yarn.scheduler.capacity.root.acl_administer_queue = *
yarn.scheduler.capacity.root.capacity = 100
yarn.scheduler.capacity.root.default.acl_administer_jobs = *
yarn.scheduler.capacity.root.default.acl_submit_applications = *
yarn.scheduler.capacity.root.default.capacity = 100
yarn.scheduler.capacity.root.default.maximum-capacity = 100
yarn.scheduler.capacity.root.default.state = RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor = 1
yarn.scheduler.capacity.root.queues = default
答案 0 :(得分:1)
您的设置:
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
的含义是:
total vcores(8) x maximum-am-resource-percent(0.2) = 1.6
由于部分vcores毫无意义,因此1.6升至2。这意味着您一次只能拥有2个应用程序主服务器,这就是您一次只能运行2个作业的原因。
解决方案,将yarn.scheduler.capacity.maximum-am-resource-percent
提升到更高的值,例如0.5。
答案 1 :(得分:0)