Question

我尝试定义几个Yarn队列，以便始终完全使用群集，并且一旦将新任务添加到不同的队列，资源将被分割（第一个队列中的几个工作人员将被抢占）

为此目的，我使用FairScheduler并依赖文档：Hadoop-FairScheduler和Cloudera-FairScheduler。

我从Ambari运行Yarn和Spark以及设置的有趣配置：

在yarn-site.xml中：

yarn.resourcemanager.scheduler.monitor.enable=true
yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.preemption=true

我在fair-scheduler.xml中定义队列：

<?xml version="1.0"?>
<allocations>
<defaultMinSharePreemptionTimeout>1</defaultMinSharePreemptionTimeout>
<defaultFairSharePreemptionTimeout>1</defaultFairSharePreemptionTimeout>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<defaultFairSharePreemptionThreshold>0.5</defaultFairSharePreemptionThreshold>.   
<queue name="team1" type="parent">
<minResources>20000 mb,2vcores</minResources>
<weight>1.0</weight>
</queue>
<queue name="team2" type="parent">
 <minResources>20000 mb,2vcores</minResources>
</queue>
<queue name="team3" type="parent">
<minResources>20000 mb,2vcores</minResources>
<fairSharePreemptionThreshold>1.0</fairSharePreemptionThreshold>
<weight>10.0</weight>
</queue>

<queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
<queueMaxResourcesDefault>40000 mb,0vcores</queueMaxResourcesDefault>

<!-- Queue 'secondary_group_queue' is a parent queue and may have
   user queues under it -->

<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>

<queuePlacementPolicy>
<rule name="specified" />
<rule name="primaryGroup" create="false" />
<rule name="nestedUserQueue">
    <rule name="secondaryGroupExistingQueue" create="false" />
</rule>
<rule name="default" queue="team5"/>
</queuePlacementPolicy>
</allocations>

我有一个简单的calcPi示例变体作为我运行的应用程序（简单的循环，不断计算pi）：

        while(true){
        SparkSession spark = SparkSession
                .builder()
                .appName("JavaPipelineExample")
                .getOrCreate();

        List<Integer> l = new ArrayList<>(NUM_SAMPLES);
        for (int i = 0; i < NUM_SAMPLES; i++) {
            l.add(i);
        }

        JavaRDD<Integer> inputRDD = new JavaSparkContext(spark.sparkContext()).parallelize(l).coalesce(100).repartition(100);
        System.out.println(String.format("Data split to %s partitions", inputRDD.partitions().size()) );

        long count = inputRDD.filter(i -> {
            double x = Math.random();
            double y = Math.random();
            return x*x + y*y < 1;
        }).count();
        System.out.println("Pi is roughly " + 4.0 * count / NUM_SAMPLES);
    }

要运行它我打开两个不同的终端，首先我运行第一个应用程序（在队列1中）。然后我检查它是否占用了所有资源并启动第二个应用程序（在队列2中）。我怀疑调度程序在队列1中抢占应用程序并将资源共享到队列2但是它没有发生：

运行应用程序1：

/usr/hdp/current/spark2-client/bin/spark-submit --master yarn --class com.comp.CalculatePi --num-executors 25 --executor-cores 6 --queue team2.aa /root/calcpi-1.0-SNAPSHOT.jar

运行后，我在Yarn管理面板中看到使用了37/54个vcores。我跑：

/usr/hdp/current/spark2-client/bin/spark-submit --master yarn --class com.comp.CalculatePi --num-executors 10 --executor-cores 4 --queue team1.aa /root/calcpi-1.0-SNAPSHOT.jar

现在我看到使用了38/54个vcores并且应用程序已成功提交，虽然它没有启动但我收到了消息：

[Timer-0] WARN org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

这是什么问题？为什么Yarn不能同时运行这两个应用程序并抢占第一个应用程序？

Answer 1

您正在将Capacity Scheduler属性与Fair Scheduler属性混合使用。

yarn.resourcemanager.scheduler.monitor.enable=true用于容量调度。 yarn.scheduler.fair.preemption用于公平调度。

在您需要的上游Fair Scheduler文档中：

Yarn FairScheduler不会在队列

1 个答案: