当我在EMR上运行Spark应用程序时,在spark / conf spark-defaults.conf文件中添加配置与在运行spark submit时添加它们之间的区别是什么?
例如,如果我将它添加到我的conf spark-defaults.conf:
spark.master yarn
spark.executor.instances 4
spark.executor.memory 29G
spark.executor.cores 3
spark.yarn.executor.memoryOverhead 4096
spark.yarn.driver.memoryOverhead 2048
spark.driver.memory 12G
spark.driver.cores 1
spark.default.parallelism 48
与将其添加到命令行参数相同:
参数:/ home / hadoop / spark / bin / spark-submit --deploy-mode cluster --master yarn-cluster --conf spark.driver.memory = 12G --conf spark.executor.memory = 29G --conf spark.executor.cores = 3 --conf spark.executor.instances = 4 --conf spark.yarn.executor.memoryOverhead = 4096 --conf spark.yarn.driver.memoryOverhead = 2048 --conf spark.driver.cores = 1 --conf spark.default.parallelism = 48 --class com.emr.spark.MyApp s3n://mybucket/application/spark/MeSparkApplication.jar
如果我在我的Java代码中添加它,它会是一样的,例如:
SparkConf sparkConf = new SparkConf().setAppName(applicationName);
sparkConf.set("spark.executor.instances", "4");
答案 0 :(得分:0)
区别在于优先权。根据{{3}}:
直接在SparkConf上设置的属性取最高优先级,然后 标志传递给spark-submit或spark-shell,然后是选项 spark-defaults.conf文件