我尝试使用此命令启动spark应用程序:
time spark-submit --master "local[4]" optimize-spark.py
但我收到了这些错误:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/27 15:43:32 INFO SparkContext: Running Spark version 1.6.0
16/01/27 15:43:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/27 15:43:32 INFO SecurityManager: Changing view acls to: DamianFox
16/01/27 15:43:32 INFO SecurityManager: Changing modify acls to: DamianFox
16/01/27 15:43:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(DamianFox); users with modify permissions: Set(DamianFox)
16/01/27 15:43:33 INFO Utils: Successfully started service 'sparkDriver' on port 51613.
16/01/27 15:43:33 INFO Slf4jLogger: Slf4jLogger started
16/01/27 15:43:33 INFO Remoting: Starting remoting
16/01/27 15:43:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.102:51614]
16/01/27 15:43:33 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 51614.
16/01/27 15:43:33 INFO SparkEnv: Registering MapOutputTracker
16/01/27 15:43:33 INFO SparkEnv: Registering BlockManagerMaster
16/01/27 15:43:33 INFO DiskBlockManager: Created local directory at /private/var/folders/8m/h5qcvjrn1bs6pv0c0_nyqrlm0000gn/T/blockmgr-defb91b0-50f9-45a7-8e92-6d15041c01bc
16/01/27 15:43:33 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/27 15:43:33 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/27 15:43:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/27 15:43:33 INFO SparkUI: Started SparkUI at http://192.168.0.102:4040
16/01/27 15:43:33 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Project/MinimumFunction/optimize-spark.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/01/27 15:43:34 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
16/01/27 15:43:34 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/27 15:43:34 INFO MemoryStore: MemoryStore cleared
16/01/27 15:43:34 INFO BlockManager: BlockManager stopped
16/01/27 15:43:34 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/27 15:43:34 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/01/27 15:43:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/27 15:43:34 INFO SparkContext: Successfully stopped SparkContext
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
ERROR - failed to write data to stream: <open file '<stdout>', mode 'w' at 0x10bb6e150>
16/01/27 15:43:34 INFO ShutdownHookManager: Shutdown hook called
16/01/27 15:43:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/8m/h5qcvjrn1bs6pv0c0_nyqrlm0000gn/T/spark-c00170ca-0e05-4ece-a962-f9303bce4f9f
spark-submit --master "local[4]" optimize-spark.py 6.12s user 0.52s system 187% cpu 3.539 total
我该如何解决这个问题?这些变量有问题吗?我搜索的时间很长,但我找不到解决方案。谢谢!
答案 0 :(得分:4)
我将项目文件夹移动到桌面文件夹,现在它正在运行。
可能它之前没有工作,因为我把项目放在一个名称有空格的文件夹中,因此命令很可能找不到该文件。
答案 1 :(得分:0)
为混乱道歉。 --py-files
用于提供程序所需的其他相关python文件,以便将它们放在PYTHONPATH
中。
我在windows / Spark-1.6中再次尝试使用命令工作: -
bin\spark-submit --master "local[4]" testingpyfiles.py
testingpyfiles.py
是一个简单的python文件,它在控制台上打印一些随机数据,并存储在我执行上述命令的同一目录中。以下是testingpyfiles.py
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Python App")
sc = SparkContext(conf=conf)
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print("Now it will print the data")
print(distData)
在您的情况下,似乎任一路径都不正确,或者执行文件的权限可能存在一些问题。还要确保optimize-spark.py
位于我们执行spark-submit
的同一目录中。
答案 2 :(得分:0)
您可以通过两种方式解决此问题:
您可以将文件作为参数传递给--py-files
,如此,
spark-submit --master "local[4]" --py-files="<filepath>/optimize-spark.py" optimize-spark.py
filepath
是本地文件系统的路径。
您可以将optimize-spark.py
文件转储到HDFS并通过代码添加
sc.addFile("hdfs:<filepath_on_hdfs>/optimize-spark.py")