pyspark:尽管将winutils添加到HADOOP_HOME,但是收到错误:找不到Hadoop二进制文件中的可执行文件null \ bin \ winutils.exe

时间:2017-02-03 07:04:00

标签: apache-spark pyspark

我在winutils.exe环境变量中设置了HADOOP_HOME路径。我还设置了其他路径,如python,spark,java和PATH变量中的所有这些路径以及pyspark。从命令提示符运行pyspark时,我仍然面临错误:

ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2327)
        at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:365)
        at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Unknown Source)
        at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
        at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
        at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
        at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
        at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Unknown Source)
.
.
.
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

如何摆脱这个错误?

1 个答案:

答案 0 :(得分:1)

变量HADOOP_HOME不应直接指向winutils.exe,而应指向其中包含bin\winutils.exe的文件夹。

例如

如果您有C:\hadoop\bin\winutils.exe,请将HADOOP_HOME设置为C:\hadoop