Question

我使用spark-summit命令执行Spark作业，参数如下：

spark-submit --master yarn-cluster --driver-cores 2 \
 --driver-memory 2G --num-executors 10 \
 --executor-cores 5 --executor-memory 2G \
 --class com.spark.sql.jdbc.SparkDFtoOracle2 \
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

现在我想使用Spark的动态资源分配执行相同的程序。在执行Spark程序时，能否帮助您使用动态资源分配。

Answer 1

在Spark中，动态分配spark.dynamicAllocation.enabled需要设置为true，因为它默认为false。

这需要将spark.shuffle.service.enabled设置为true，因为火花应用程序正在YARN上运行。检查此link to start the shuffle service on each NodeManager in YARN。

以下配置也相关：

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

可以通过3种方式将这些选项配置为Spark应用程序

<强> 1。来自Spark提交--conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster \
    --driver-cores 2 \
    --driver-memory 2G \
    --num-executors 10 \
    --executor-cores 5 \
    --executor-memory 2G \
    --conf spark.dynamicAllocation.minExecutors=5
    --conf spark.dynamicAllocation.maxExecutors=30
    --conf spark.dynamicAllocation.initialExecutors=10 
    --class com.spark.sql.jdbc.SparkDFtoOracle2 \
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

<强> 2。使用SparkConf

的Spark内部程序

在SparkConf中设置属性，然后用它创建SparkSession或SparkContext

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

第3。 spark-defaults.conf通常位于$SPARK_HOME/conf/

如果没有从命令行和代码传递配置，则在spark-defaults.conf中放置相同的配置以申请所有spark应用程序。

Spark - Dynamic Allocation Confs

Answer 2

我只是用Spark的动态资源分配做了一个小演示。该代码在我的Github上。具体来说，该演示位于this release中。

如何使用动态资源分配执行Spark程序？

2 个答案: