(Spark Schedular)火花作业池中Fair和FIFO之间的区别是什么?

时间:2018-02-06 23:01:53

标签: apache-spark bigdata job-scheduling

我知道,对于spark,我们可以将不同的池设置为Fair或FIFO,并且行为可以不同。但是,在fairscheduler.xml中我们还可以将单个池设置为Fair或FIFO,并且我测试了几次,因为它们的行为似乎是相同的。然后我看一下火花源代码,scheduAlgorithm就像这样:

/**
 * An interface for sort algorithm
 * FIFO: FIFO algorithm between TaskSetManagers
 * FS: FS algorithm between Pools, and FIFO or FS within Pools
 */
private[spark] trait SchedulingAlgorithm {
  def comparator(s1: Schedulable, s2: Schedulable): Boolean
}

private[spark] class FIFOSchedulingAlgorithm extends SchedulingAlgorithm{
  override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {
    val priority1 = s1.priority
    val priority2 = s2.priority
    var res = math.signum(priority1 - priority2)
    if (res == 0) {
      val stageId1 = s1.stageId
      val stageId2 = s2.stageId
      res = math.signum(stageId1 - stageId2)
    }
    res < 0
  }
}

private[spark] class FairSchedulingAlgorithm extends SchedulingAlgorithm{
  override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {
    val minShare1 = s1.minShare
    val minShare2 = s2.minShare
    val runningTasks1 = s1.runningTasks
    val runningTasks2 = s2.runningTasks
    val s1Needy = runningTasks1 < minShare1
    val s2Needy = runningTasks2 < minShare2
    val minShareRatio1 = runningTasks1.toDouble / math.max(minShare1, 1.0)
    val minShareRatio2 = runningTasks2.toDouble / math.max(minShare2, 1.0)
    val taskToWeightRatio1 = runningTasks1.toDouble / s1.weight.toDouble
    val taskToWeightRatio2 = runningTasks2.toDouble / s2.weight.toDouble

    var compare = 0
    if (s1Needy && !s2Needy) {
      return true
    } else if (!s1Needy && s2Needy) {
      return false
    } else if (s1Needy && s2Needy) {
      compare = minShareRatio1.compareTo(minShareRatio2)
    } else {
      compare = taskToWeightRatio1.compareTo(taskToWeightRatio2)
    }
    if (compare < 0) {
      true
    } else if (compare > 0) {
      false
    } else {
      s1.name < s2.name
    }
  }
}

在fairSchedulingAlgorithm中,如果s1和s2来自同一个池,则minshare,runningtask和weight应该是相同的值,这样我们总能得到返回值为false。所以它们不是公平的,而是FIFO。我的fairscheduler.xml是这样的:

<allocations>
  <pool name="default">
    <schedulingMode>FAIR</schedulingMode>
    <weight>3</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="cubepublishing">
      <schedulingMode>FAIR</schedulingMode>
      <weight>1</weight>
      <minShare>0</minShare>
  </pool>
</allocations>

spark.scheduler.mode是:

# job scheduler
spark.scheduler.mode              FAIR
spark.scheduler.allocation.file   conf/fairscheduler.xml

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

当您使用 spark-submit 或任何其他方式在集群中提交作业时,它会提供给负责实现作业逻辑计划的 Spark 调度程序。在 spark 中,我们有两种模式。

1.先进先出 默认情况下,Spark 的调度程序以 FIFO 方式运行作业。每个作业被分为多个阶段(例如映射和缩减阶段),第一个作业在所有可用资源上获得优先权,而其阶段有要启动的任务,然后第二个作业获得优先权,依此类推。 如果队列中的作业不需要使用整个集群,后面的作业可以马上开始运行,但是如果队列头部的作业很大,那么后面的作业可能会出现明显的延迟。

2.公平 公平调度器还支持将作业分组到池中并为每个池设置不同的调度选项(例如权重)。例如,这对于为更重要的作业创建高优先级池很有用,或者将每个用户的作业组合在一起并为用户提供相等的份额,而不管他们有多少并发作业,而不是为作业提供平等的份额。这种方法模仿了 Hadoop Fair Scheduler。

在没有任何干预的情况下,新提交的作业会进入默认池,但可以通过将 spark.scheduler.pool“本地属性”添加到提交作业的线程中的 SparkContext 来设置作业的池。