我可以在独立机器上通过Spark运行简单的Hello World程序。但是当我使用Spark Context运行一个单词计数程序并使用pyspark运行它时,我得到以下错误。 错误SparkContext:初始化SparkContext时出错。 java.io.FileNotFoundException:添加了文件文件:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py不存在。 我在Mac OS X上。我通过brew install apache-spark安装了Spark。现在任何想法都出错了?
使用Spark的默认log4j配置文件:
org/apache/spark/log4j-defaults.properties
16/07/19 23:18:45 INFO SparkContext: Running Spark version 1.6.2
16/07/19 23:18:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/19 23:18:45 INFO SecurityManager: Changing view acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: Changing modify acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tanyagupta); users with modify permissions: Set(tanyagupta)
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriver' on port 59226.
16/07/19 23:18:46 INFO Slf4jLogger: Slf4jLogger started
16/07/19 23:18:46 INFO Remoting: Starting remoting
16/07/19 23:18:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.5:59227]
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 59227.
16/07/19 23:18:46 INFO SparkEnv: Registering MapOutputTracker
16/07/19 23:18:46 INFO SparkEnv: Registering BlockManagerMaster
16/07/19 23:18:46 INFO DiskBlockManager: Created local directory at /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/blockmgr-812de6f9-3e3d-4885-a7de-fc9c2e181c64
16/07/19 23:18:46 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/07/19 23:18:46 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/19 23:18:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/19 23:18:46 INFO SparkUI: Started SparkUI at http://192.168.0.5:4040
16/07/19 23:18:46 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/07/19 23:18:47 INFO SparkUI: Stopped Spark web UI at http://192.168.0.5:4040
16/07/19 23:18:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/19 23:18:47 INFO MemoryStore: MemoryStore cleared
16/07/19 23:18:47 INFO BlockManager: BlockManager stopped
16/07/19 23:18:47 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/19 23:18:47 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/07/19 23:18:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/19 23:18:47 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/Users/tanyagupta/Documents/Internship/Zyudly Labs/Tanya-Programs/word_count.py", line 7, in <module>
sc=SparkContext(appName="WordCount_Tanya")
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/19 23:18:47 INFO ShutdownHookManager: Shutdown hook called
16/07/19 23:18:47 INFO ShutdownHookManager: Deleting directory /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/spark-f69e5dfc-6561-4677-9ec0-03594eabc991
答案 0 :(得分:1)
在我的文件夹中添加__init__.py
文件对我有用!
谢谢!
答案 1 :(得分:0)
由于路径中的空间而被看到。从路径中删除空间后,我能够解决此问题。希望对您有所帮助。
删除空间-/ Zyudly%20Labs /并尝试