Question

我在YARN（Hadoop 2.6.0 / CDH 5.5）上部署了各种Spark版本（1.6,2.0,2.1）。我试图保证某个应用程序永远不会在我们的YARN集群上缺乏资源，无论其他地方运行的是什么。

我启用了shuffle服务并设置了一些Fair Scheduler Pools，如Spark文档中所述。我为高优先级应用程序创建了一个单独的池，我希望永远不会缺少资源，并为其提供minShare资源：

<?xml version="1.0"?>
<allocations>
  <pool name="default">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>0</minShare>
  </pool>
  <pool name="high_priority">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>24</minShare>
  </pool>
</allocations>

当我在YARN群集上运行Spark应用程序时，我可以看到我配置的池已被识别：

17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1

但是，我没有看到我的应用程序正在使用新的high_priority池，即使我在调用spark.scheduler.pool时设置了spark-submit。这意味着当群集与常规活动挂钩时，我的高优先级应用程序无法获得所需的资源：

17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default
17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我在这里缺少什么？我的同事和我试图在YARN中实现抢占，但这没有做任何事情。然后我们意识到YARN中有一个概念与名为YARN queues的Spark调度程序池非常相似。所以现在我们不确定这两个概念是否会以某种方式发生冲突。

我们如何让我们的高优先级池按预期工作？ Spark调度程序池和YARN队列之间是否存在某种冲突？

Answer 1

有人over on the spark-users list澄清了一些解释为什么我没有得到我期望的东西：Spark调度程序池用于管理应用程序中的资源，而YARN队列用于管理资源跨应用程序。我需要后者而错误地使用前者。

在Job Scheduling下的Spark文档中对此进行了解释。我只是被粗心的阅读和混乱的工作＆＃34;在Spark的技术意义上（即Spark应用程序中的操作）和＆＃34; job＆＃34;作为我的同事，我通常用它来表示提交给集群的应用程序。

在YARN上运行时，Spark调度程序池如何工作？

1 个答案: