我正在使用bin / spark-submit脚本提交我的python作业。我想通过SparkConf& amp;来配置python代码中的参数。 SparkContext类。我尝试在SparkConf对象中设置appName,master,spark.cores.max和spark.scheduler.mode,并将其作为参数(conf)传递给SparkContext。事实证明,作业没有发送到我设置的独立集群的主服务器(它只是本地运行,click to see the screenshot of localhost:4040/environment/)。通过下面的print语句,很清楚地看到我传递给SparkContext的SparkConf具有所有4个配置()但是SparkContext对象中的conf变量没有任何我的更新,尽管有默认的conf。
对于其他方式,我尝试使用
conf/spark-defaults.conf
或--properties-file my-spark.conf --master spark://spark-master:7077
。
它工作。我也尝试单独使用主参数设置主
sc = pyspark.SparkContext(master="spark://spark-master:7077", conf=conf)
它也有效:
SparkContextConf: [
('spark.app.name', 'test.py'), ('spark.rdd.compress', 'True'),
('spark.driver.port', '41147'),
('spark.app.id', 'app-20170403234627-0001'),
('spark.master','spark://spark-master:7077'),
('spark.serializer.objectStreamReset', '100'),
('spark.executor.id', 'driver'),
('spark.submit.deployMode', 'client'),
('spark.files', 'file:/mnt/scratch/yidi/docker-volume/test.py'),
('spark.driver.host', '10.0.0.5')
]
所以似乎只有conf参数才能被SparkContext正确拥塞。
Spark职业代码:
import operator
import pyspark
def main():
'''Program entry point'''
import socket
print(socket.gethostbyname(socket.gethostname()))
print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
num_cores = 4
conf = pyspark.SparkConf()
conf.setAppName("word-count").setMaster("spark://spark-master:7077")#setExecutorEnv("CLASSPATH", path)
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.cores.max", num_cores)
print("Conf: {}".format(conf.getAll()))
sc = pyspark.SparkContext(conf=conf)
print("SparkContextConf: {}".format(sc.getConf().getAll()))
#Intialize a spark context
# with pyspark.SparkContext(conf=conf) as sc:
# #Get a RDD containing lines from this script file
# lines = sc.textFile("/mnt/scratch/yidi/docker-volume/war_and_peace.txt")
# #Split each line into words and assign a frequency of 1 to each word
# words = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1))
# #count the frequency for words
# counts = words.reduceByKey(operator.add)
# #Sort the counts in descending order based on the word frequency
# sorted_counts = counts.sortBy(lambda x: x[1], False)
# #Get an iterator over the counts to print a word and its frequency
# for word,count in sorted_counts.toLocalIterator():
# print("{} --> {}".format(word.encode('utf-8'), count))
# print "Spark conf: ", conf.getAll()
if __name__ == "__main__":
main()
作业执行控制台输出:
root@bdd1ba99bf4f:/usr/spark-2.1.0# bin/spark-submit /mnt/scratch/yidi/docker-volume/test.py
10.0.0.5
Conf: dict_items([('spark.scheduler.mode', 'FAIR'), ('spark.app.name', 'word-count'), ('spark.master', 'spark://spark-master:7077'), ('spark.cores.max', '4')])
17/04/03 23:24:27 INFO spark.SparkContext: Running Spark version 2.1.0
17/04/03 23:24:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/03 23:24:27 INFO spark.SecurityManager: Changing view acls to: root
17/04/03 23:24:27 INFO spark.SecurityManager: Changing modify acls to: root
17/04/03 23:24:27 INFO spark.SecurityManager: Changing view acls groups to:
17/04/03 23:24:27 INFO spark.SecurityManager: Changing modify acls groups to:
17/04/03 23:24:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/04/03 23:24:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 38193.
17/04/03 23:24:27 INFO spark.SparkEnv: Registering MapOutputTracker
17/04/03 23:24:27 INFO spark.SparkEnv: Registering BlockManagerMaster
17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/04/03 23:24:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-7b545c86-9344-4f73-a3db-e891c562714d
17/04/03 23:24:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
17/04/03 23:24:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/04/03 23:24:27 INFO util.log: Logging initialized @1090ms
17/04/03 23:24:27 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3950ddbb{/jobs,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@286430d5{/jobs/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@404eb034{/jobs/job,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@79e3feec{/jobs/job/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4661596e{/stages,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4f8a3bef{/stages/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a70fd3a{/stages/stage,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c826806{/stages/stage/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50e4e8d1{/stages/pool,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e2fe461{/stages/pool/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33cb59b3{/storage,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c06c594{/storage/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4b5300a5{/storage/rdd,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a6ef942{/storage/rdd/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1301217d{/environment,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19217cec{/environment/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a241145{/executors,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@470d45aa{/executors/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5d9d8eff{/executors/threadDump,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4fc94fbc{/executors/threadDump/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@258dd139{/static,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@880f037{/,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@395b7dae{/api,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3cea7196{/jobs/job/kill,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77257b2b{/stages/stage/kill,null,AVAILABLE}
17/04/03 23:24:27 INFO server.ServerConnector: Started ServerConnector@788dbc45{HTTP/1.1}{0.0.0.0:4040}
17/04/03 23:24:27 INFO server.Server: Started @1154ms
17/04/03 23:24:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/04/03 23:24:27 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://localhost:4040
17/04/03 23:24:27 INFO spark.SparkContext: Added file file:/mnt/scratch/yidi/docker-volume/test.py at file:/mnt/scratch/yidi/docker-volume/test.py with timestamp 1491261867734
17/04/03 23:24:27 INFO util.Utils: Copying /mnt/scratch/yidi/docker-volume/test.py to /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6/userFiles-ed0a6692-b96a-4b3d-be16-215a21cf836b/test.py
17/04/03 23:24:27 INFO executor.Executor: Starting executor ID driver on host localhost
17/04/03 23:24:27 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44223.
17/04/03 23:24:27 INFO netty.NettyBlockTransferService: Server created on 10.0.0.5:44223
17/04/03 23:24:27 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/04/03 23:24:27 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.0.5:44223 with 366.3 MB RAM, BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c06133a{/metrics/json,null,AVAILABLE}
SparkContextConf: [('spark.app.name', 'test.py'), ('spark.rdd.compress', 'True'), ('spark.serializer.objectStreamReset', '100'), ('spark.master', 'local[*]'), ('spark.executor.id', 'driver'), ('spark.submit.deployMode', 'client'), ('spark.files', 'file:/mnt/scratch/yidi/docker-volume/test.py'), ('spark.driver.host', '10.0.0.5'), ('spark.app.id', 'local-1491261867756'), ('spark.driver.port', '38193')]
17/04/03 23:24:27 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/04/03 23:24:27 INFO server.ServerConnector: Stopped ServerConnector@788dbc45{HTTP/1.1}{0.0.0.0:4040}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77257b2b{/stages/stage/kill,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3cea7196{/jobs/job/kill,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@395b7dae{/api,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@880f037{/,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@258dd139{/static,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4fc94fbc{/executors/threadDump/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5d9d8eff{/executors/threadDump,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@470d45aa{/executors/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4a241145{/executors,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@19217cec{/environment/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1301217d{/environment,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7a6ef942{/storage/rdd/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4b5300a5{/storage/rdd,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3c06c594{/storage/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33cb59b3{/storage,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4e2fe461{/stages/pool/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@50e4e8d1{/stages/pool,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1c826806{/stages/stage/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7a70fd3a{/stages/stage,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4f8a3bef{/stages/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4661596e{/stages,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@79e3feec{/jobs/job/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@404eb034{/jobs/job,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@286430d5{/jobs/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3950ddbb{/jobs,null,UNAVAILABLE}
17/04/03 23:24:27 INFO ui.SparkUI: Stopped Spark web UI at http://localhost:4040
17/04/03 23:24:27 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/04/03 23:24:27 INFO memory.MemoryStore: MemoryStore cleared
17/04/03 23:24:27 INFO storage.BlockManager: BlockManager stopped
17/04/03 23:24:27 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/04/03 23:24:27 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/04/03 23:24:27 INFO spark.SparkContext: Successfully stopped SparkContext
17/04/03 23:24:27 INFO util.ShutdownHookManager: Shutdown hook called
17/04/03 23:24:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6/pyspark-50d75419-36b8-4dd2-a7c6-7d500c7c5bca
17/04/03 23:24:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6