我试图使用下面的命令使用Python(.py文件)运行spark作业。 $ SPARK_HOME / bin / spark-submit~ / Project / SparkTest.py --py-files~ / Project / SparkTest.py
该作业失败并出现异常"无法解析MASTER URL:''
我做了一些调试,发现当工作开始时,spark.master的值被设置为'''而不是" spark://10.0.0.5:31016"这是我在spark-defaults.conf中配置的主ip和端口
这是提交火花工作后的完整输出
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/19 22:25:43 INFO SparkContext: Running Spark version 2.2.0
17/11/19 22:25:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/19 22:25:44 INFO SparkContext: Submitted application: SparkTest.py
17/11/19 22:25:44 INFO SparkContext: Spark configuration:
spark.app.name=SparkTest.py
spark.driver.cores=2
spark.driver.memory=3g
spark.eventLog.dir=hdfs://10.0.0.5:31001/spark_log
spark.eventLog.enabled=true
spark.executor.memory=3g
spark.files=file:/home/admin/Project/SparkTest.py
spark.kryoserializer.buffer.max=1536m
spark.logConf=true
spark.master=<pyspark.conf.SparkConf object at 0x7fb6b70e3898>
spark.rdd.compress=True
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.serializer.objectStreamReset=100
spark.submit.deployMode=client
17/11/19 22:25:44 INFO SecurityManager: Changing view acls to: admin
17/11/19 22:25:44 INFO SecurityManager: Changing modify acls to: admin
17/11/19 22:25:44 INFO SecurityManager: Changing view acls groups to:
17/11/19 22:25:44 INFO SecurityManager: Changing modify acls groups to:
17/11/19 22:25:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(admin); groups with view permissions: Set(); users with modify permissions: Set(admin); groups with modify permissions: Set()
17/11/19 22:25:44 INFO Utils: Successfully started service 'sparkDriver' on port 41829.
17/11/19 22:25:44 INFO SparkEnv: Registering MapOutputTracker
17/11/19 22:25:44 INFO SparkEnv: Registering BlockManagerMaster
17/11/19 22:25:44 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/19 22:25:44 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/19 22:25:44 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4007fc95-6531-4447-a095-0730713d7758
17/11/19 22:25:44 INFO MemoryStore: MemoryStore started with capacity 1458.6 MB
17/11/19 22:25:44 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/19 22:25:44 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/19 22:25:44 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.0.5:4040
17/11/19 22:25:44 INFO SparkContext: Added file file:/home/admin/Project/SparkTest.py at spark://10.0.0.5:41829/files/SparkTest.py with timestamp 1511130344827
17/11/19 22:25:44 INFO Utils: Copying /home/admin/Project/SparkTest.py to /tmp/spark-940a6faa-cf59-4d47-87c6-b3f39296c19d/userFiles-d3c17550-6141-496d-aacd-0f83f813a3a0/SparkTest.py
17/11/19 22:25:44 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7fb6b70e3898>'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2760)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
17/11/19 22:25:44 INFO SparkUI: Stopped Spark web UI at http://10.0.0.5:4040
17/11/19 22:25:44 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/11/19 22:25:44 INFO MemoryStore: MemoryStore cleared
17/11/19 22:25:44 INFO BlockManager: BlockManager stopped
17/11/19 22:25:44 INFO BlockManagerMaster: BlockManagerMaster stopped
17/11/19 22:25:44 WARN MetricsSystem: Stopping a MetricsSystem that is not running
17/11/19 22:25:44 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/11/19 22:25:44 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/home/admin/Project/SparkTest.py", line 21, in <module>
sc = SparkContext(conf)
File "/home/admin/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
File "/home/admin/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
File "/home/admin/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 273, in _initialize_context
File "/home/admin/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
File "/home/admin/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7fb6b70e3898>'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2760)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
17/11/19 22:25:44 INFO ShutdownHookManager: Shutdown hook called
17/11/19 22:25:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-940a6faa-cf59-4d47-87c6-b3f39296c19d
答案 0 :(得分:3)
我在发布后立即找到了解决方案,我在使用参数名称直接传递conf,同时实例化'SparkContext',将其修改为SparkContext(conf = conf)解决了问题。