错误SparkContext:初始化SparkContext时出错

时间:2015-10-22 08:39:16

标签: python pyspark

我在Windows上使用 spark-1.5.1-bin-hadoop2.6 。尝试从URL中获取示例 - 自包含应用程序(Python)

http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications

命令是

spark-submit --master local[1] D:\spark-1.5.1-bin-hadoop2.6\examples\src\main\python\SimpleApp.py

出现以下错误

> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties 15/10/22 11:48:37 INFO
> SparkContext: Running Spark version 1.5.1 15/10/22 11:48:38 WARN
> NativeCodeLoader: Unable to load native-hadoop library fo r your
> platform... using builtin-java classes where applicable 15/10/22
> 11:48:38 ERROR Shell: Failed to locate the winutils binary in the
> hadoo p binary path java.io.IOException: Could not locate executable
> null\bin\winutils.exe in the Ha doop binaries.
>         at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
>         at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
>         at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
>         at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
>         at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
> 
>         at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
>         at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
>         at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Group
> s.java:280)
>         at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI
> nformation.java:271)
>         at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use
> rGroupInformation.java:248)
>         at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
> UserGroupInformation.java:763)
>         at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou
> pInformation.java:748)
>         at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr
> oupInformation.java:621)
>         at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
> .scala:2084)
>         at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
> .scala:2084)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2084)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:311)
>         at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.sc
> ala:61)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
> orAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
> onstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:214)
>         at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand
> .java:79)
>         at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:722) 15/10/22 11:48:38 INFO SecurityManager: Changing view acls to: Ashitha.K 15/10/22
> 11:48:38 INFO SecurityManager: Changing modify acls to: Ashitha.K
> 15/10/22 11:48:38 INFO SecurityManager: SecurityManager:
> authentication disabled ; ui acls disabled; users with view
> permissions: Set(Ashitha.K); users with modi fy permissions:
> Set(Ashitha.K) 15/10/22 11:48:39 INFO Slf4jLogger: Slf4jLogger started
> 15/10/22 11:48:39 INFO Remoting: Starting remoting 15/10/22 11:48:39
> INFO Remoting: Remoting started; listening on addresses :[akka
> .tcp://sparkDriver@192.168.200.101:54648] 15/10/22 11:48:39 INFO
> Utils: Successfully started service 'sparkDriver' on port
>  54648. 15/10/22 11:48:39 INFO SparkEnv: Registering MapOutputTracker 15/10/22 11:48:39 INFO SparkEnv: Registering BlockManagerMaster
> 15/10/22 11:48:40 INFO DiskBlockManager: Created local directory at
> C:\Users\Ash
> itha.k\AppData\Local\Temp\blockmgr-f24c756c-f7d8-433a-b57e-900c6017c515
> 15/10/22 11:48:40 INFO MemoryStore: MemoryStore started with capacity
> 529.9 MB 15/10/22 11:48:40 INFO HttpFileServer: HTTP File server directory is C:\Users\As
> hitha.k\AppData\Local\Temp\spark-156dc076-c867-447a-b90c-b02ef4fdef02\httpd-9ffd
> 376e-2f81-4355-8716-06967dbd827a 15/10/22 11:48:40 INFO HttpServer:
> Starting HTTP Server 15/10/22 11:48:40 INFO Utils: Successfully
> started service 'HTTP file server' on  port 54649. 15/10/22 11:48:40
> INFO SparkEnv: Registering OutputCommitCoordinator 15/10/22 11:48:40
> INFO Utils: Successfully started service 'SparkUI' on port 404
> 0. 15/10/22 11:48:40 INFO SparkUI: Started SparkUI at http://192.168.200.101:4040 15/10/22 11:48:42 WARN : Your hostname,
> LT1A077 resolves to a loopback/non-reach able address:
> fe80:0:0:0:0:5efe:c0a8:c865%34, but we couldn't find any external IP
> address! 15/10/22 11:48:42 INFO Utils: Copying
> D:\spark-1.5.1-bin-hadoop2.6\examples\src\ main\python\SimpleApp.py to
> C:\Users\Ashitha.k\AppData\Local\Temp\spark-156dc076
> -c867-447a-b90c-b02ef4fdef02\userFiles-eb0a2db3-8539-427e-becd-51f51de58346\Simp leApp.py 15/10/22 11:48:42 ERROR SparkContext: Error initializing
> SparkContext. java.lang.NullPointerException
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
> 715)
>         at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
>         at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
>         at org.apache.spark.util.Utils$.fetchFile(Utils.scala:381)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1387)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1341)
>         at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:48
> 4)
>         at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:48
> 4)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:484)
>         at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.sc
> ala:61)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
> orAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
> onstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:214)
>         at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand
> .java:79)
>         at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:722) 15/10/22 11:48:42 INFO SparkUI: Stopped Spark web UI at http://192.168.200.101:4 040
> 15/10/22 11:48:42 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEnd point stopped! 15/10/22 11:48:42 ERROR
> Utils: Uncaught exception in thread Thread-3
> java.lang.NullPointerException
>         at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyB
> lockTransferService.scala:152)
>         at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1228)
>         at org.apache.spark.SparkEnv.stop(SparkEnv.scala:100)
>         at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkCont
> ext.scala:1749)
>         at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1748)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:593)
>         at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.sc
> ala:61)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
> orAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
> onstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:214)
>         at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand
> .java:79)
>         at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:722) 15/10/22 11:48:42 INFO SparkContext: Successfully stopped SparkContext Traceback (most
> recent call last):   File
> "D:/spark-1.5.1-bin-hadoop2.6/examples/src/main/python/SimpleApp.py",
> lin e 5, in <module>
>     sc = SparkContext("local", "Simple App")   File "D:\spark-1.5.1-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\context.py",
> line 113, in __init__   File
> "D:\spark-1.5.1-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\context.py",
> line 170, in _do_init   File
> "D:\spark-1.5.1-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\context.py",
> line 224, in _initialize_context   File
> "D:\spark-1.5.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_g
> ateway.py", line 701, in __call__   File
> "D:\spark-1.5.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protoc
> ol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An
> error occurred while calling None.org.apache.spa
> rk.api.java.JavaSparkContext. : java.lang.NullPointerException
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
> 715)
>         at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
>         at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
>         at org.apache.spark.util.Utils$.fetchFile(Utils.scala:381)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1387)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1341)
>         at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:48
> 4)
>         at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:48
> 4)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:484)
>         at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.sc
> ala:61)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
> orAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
> onstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:214)
>         at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand
> .java:79)
>         at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:722)
> 
> 15/10/22 11:48:42 INFO DiskBlockManager: Shutdown hook called 15/10/22
> 11:48:42 INFO ShutdownHookManager: Shutdown hook called 15/10/22
> 11:48:43 INFO ShutdownHookManager: Deleting directory
> C:\Users\Ashitha.
> k\AppData\Local\Temp\spark-156dc076-c867-447a-b90c-b02ef4fdef02\userFiles-eb0a2d
> b3-8539-427e-becd-51f51de58346 15/10/22 11:48:43 INFO
> ShutdownHookManager: Deleting directory C:\Users\Ashitha.
> k\AppData\Local\Temp\spark-156dc076-c867-447a-b90c-b02ef4fdef02

1 个答案:

答案 0 :(得分:0)

我为我解决类似问题的方法是:将数据集文件/文件夹保存在与脚本相同的目录中。例如,如果脚本的位置为List<String>exercise_list; ,则保留数据:~/scripts/script.py~/scripts/data/README。基本上,保持数据相对于您的脚本并在代码中相应地提供路径。希望这会有所帮助:)