无法从终端读取带有“火花提交”的文件

时间:2018-11-09 01:11:43

标签: python apache-spark pyspark

我正在尝试使用.py从终端运行spark-submit file.py文件,但是它不起作用。但是,如果我使用python file.py进行阅读,则可以使用。

这是错误:

2018-11-08 17:06:51 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-11-08 17:06:52 INFO  SparkContext:54 - Running Spark version 2.3.1
2018-11-08 17:06:52 INFO  SparkContext:54 - Submitted application: hw3
2018-11-08 17:06:52 INFO  SecurityManager:54 - Changing view acls to: dummy
2018-11-08 17:06:52 INFO  SecurityManager:54 - Changing modify acls to: dummy
2018-11-08 17:06:52 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-11-08 17:06:52 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-11-08 17:06:52 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(vivianamarquez); groups with view permissions: Set(); users  with modify permissions: Set(vivianamarquez); groups with modify permissions: Set()
2018-11-08 17:06:52 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 57575.
2018-11-08 17:06:52 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-11-08 17:06:52 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-11-08 17:06:52 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-11-08 17:06:52 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-11-08 17:06:52 INFO  DiskBlockManager:54 - Created local directory at /private/var/folders/n7/q93jwpcs6jndz6qqvj4mhtcm0000gn/T/blockmgr-bc531d91-4ca0-4c93-afc2-5cf5c3389b86
2018-11-08 17:06:52 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-11-08 17:06:52 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-11-08 17:06:52 INFO  log:192 - Logging initialized @1912ms
2018-11-08 17:06:52 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-11-08 17:06:52 INFO  Server:414 - Started @1978ms
2018-11-08 17:06:52 INFO  AbstractConnector:278 - Started ServerConnector@7f04b8eb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-11-08 17:06:52 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4871d3cc{/jobs,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3697e88c{/jobs/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ff21a8{/jobs/job,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@20c20340{/jobs/job/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29985c5c{/stages,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7330daa6{/stages/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5febd2c2{/stages/stage,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7182c6b2{/stages/stage/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@70fe7782{/stages/pool,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7998b03{/stages/pool/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1552fba5{/storage,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@728208eb{/storage/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7143335e{/storage/rdd,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a496fe6{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@38c424d9{/environment,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5ae3a67a{/environment/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3252b7bb{/executors,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4395d848{/executors/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5aeece0f{/executors/threadDump,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1d79635e{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a31e025{/static,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4d098d91{/,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@680392d9{/api,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bae8a18{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7e5f6ce6{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-11-08 17:06:52 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://10.1.152.221:4040
2018-11-08 17:06:52 ERROR SparkContext:91 - Error initializing SparkContext.
java.io.FileNotFoundException: File file:/Users/dummy/Desktop/hw.py does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
	at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
	at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499)
	at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
	at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:461)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
2018-11-08 17:06:52 INFO  AbstractConnector:318 - Stopped Spark@7f04b8eb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-11-08 17:06:52 INFO  SparkUI:54 - Stopped Spark web UI at http://10.1.152.221:4040
2018-11-08 17:06:52 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-11-08 17:06:52 INFO  MemoryStore:54 - MemoryStore cleared
2018-11-08 17:06:52 INFO  BlockManager:54 - BlockManager stopped
2018-11-08 17:06:52 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-11-08 17:06:52 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
2018-11-08 17:06:52 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-11-08 17:06:52 INFO  SparkContext:54 - Successfully stopped SparkContext
Traceback (most recent call last):
  File "/Users/dummy/Desktop/hw.py", line 6, in <module>
    sc = SparkContext(appName=app_name);
  File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
  File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
  File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 282, in _initialize_context
  File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
  File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: File file:/Users/dummy/Desktop/hw.py does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
	at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
	at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499)
	at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
	at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:461)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

2018-11-08 17:06:52 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-11-08 17:06:52 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/n7/q93jwpcs6jndz6qqvj4mhtcm0000gn/T/spark-36742eed-5188-4642-a9db-29cb8efd0514
2018-11-08 17:06:52 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/n7/q93jwpcs6jndz6qqvj4mhtcm0000gn/T/spark-1b0c4122-4c22-46ba-840d-b1326bc0e840

为什么会这样?帮助将不胜感激!

1 个答案:

答案 0 :(得分:0)

您需要将所有相关的外部文件添加到作业中,否则执行程序容器找不到它们(除非您从hdfs中读取了它们)。您可以使用return list[len(list)} # same thing here

添加它
--files

但是,使用spark-submit --files hw.py file.py 会将其添加到容器--py-files中。所以您可能更喜欢

PYTHONPATH

spark-submit --py-files hw.py file.py 运行时,驱动程序和执行程序是相同的。

相关问题