有没有办法在执行spark submit时覆盖核心站点属性值?
我可以在spark-env.sh中使用HADOOP_CONF_DIR变量来指向新的核心站点文件,但我只想覆盖几个值来为每个spark作业使用不同的存储帐户。
答案 0 :(得分:1)
Found answer to my own question.
hadoop-related configuration can be overridden by pre-fixing "spark.hadoop" to property key, then submit to sparkconf.
i.e. spark-submit --sparkconf spark.hadoop.io.file.buffer.size 12952
See source code here: https://github.com/apache/spark/commit/b6cf1348170951396a6a5d8a65fb670382304f5b