在EMR集群中运行Spark应用程序时,在何处指定Spark配置

时间:2018-05-01 18:24:19

标签: apache-spark hadoop emr

当我在EMR上运行Spark应用程序时,在spark / conf spark-defaults.conf文件中添加配置与在运行spark submit时添加它们之间的区别是什么?

例如,如果我将它添加到我的conf spark-defaults.conf:

spark.master         yarn
spark.executor.instances            4
spark.executor.memory               29G
spark.executor.cores                3
spark.yarn.executor.memoryOverhead  4096
spark.yarn.driver.memoryOverhead    2048
spark.driver.memory                 12G
spark.driver.cores                  1
spark.default.parallelism           48

与将其添加到命令行参数相同:

  

参数:/ home / hadoop / spark / bin / spark-submit --deploy-mode cluster   --master yarn-cluster --conf spark.driver.memory = 12G --conf spark.executor.memory = 29G --conf spark.executor.cores = 3 --conf   spark.executor.instances = 4 --conf   spark.yarn.executor.memoryOverhead = 4096 --conf   spark.yarn.driver.memoryOverhead = 2048 --conf spark.driver.cores = 1   --conf spark.default.parallelism = 48 --class com.emr.spark.MyApp s3n://mybucket/application/spark/MeSparkApplication.jar

如果我在我的Java代码中添加它,它会是一样的,例如:

SparkConf sparkConf = new SparkConf().setAppName(applicationName);
        sparkConf.set("spark.executor.instances", "4");

1 个答案:

答案 0 :(得分:0)

区别在于优先权。根据{{​​3}}:

  

直接在SparkConf上设置的属性取最高优先级,然后   标志传递给spark-submit或spark-shell,然后是选项   spark-defaults.conf文件