这就是我调用pyspark作业的方式:
./spark-submit --master yarn ~/workspace/really_big_sparktask.py --deploy-mode cluster
我在pyspark作业中设置了以下配置。
if __name__ == "__main__":
conf = ps.SparkConf().setAll([
(u'spark.app.name', u'Magic maker'),
('spark.executor.memory', '16g'),
('spark.driver.memory','8g'),
('spark.executor.cores', '3'),
('spark.dynamicAllocation.maxExecutors', 50),
('spark.dynamicAllocation.initialExecutors', 45)])
sc = SparkContext("yarn", "Magic", conf=conf)
from pprint import pprint
pprint(sorted(sc.getConf().getAll()))
spark = SparkSession(sc)
我注意到我的配置都没有得到认可:
(u'spark.dynamicAllocation.enabled', u'true'),
(u'spark.dynamicAllocation.maxExecutors', u'2'),