以下几行代码。我将包含更多内容,但我怀疑该错误是由于我的环境而不是代码引起的。紧接着this tutorial,但我使用的是不同的数据和不同版本的Spark。
def topic_render(topic, vocabArray):
terms = topic[0]
result = []
for i in range(0, 5):
term = vocabArray[terms[i]]
result.append(term)
return result
lda_model = LDA.train(result_tfidf[['index','features']]
.rdd.mapValues(Vectors.fromML)
.map(list), k=10, maxIterations=100)
topicIndices = spark.sparkContext.parallelize(lda_model.describeTopics(maxTermsPerTopic = 5))
#The above line passes
topics_final = topicIndices.map(lambda topic: topic_render(topic, vocabArray)).collect()
#Crashes on this line; error log incomprehensible
下面是日志输出的几行(这确实很长,并且大部分只是重复此部分)。很难理解发生了什么问题-我认为我不需要在hadoop二进制路径中使用winutils二进制文件,也不需要本机-hadoop库中的东西,因为每次我在Spark中执行任何操作时都会看到这些错误,并且从未引起过以前有问题。
19/11/03 16:21:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/03 16:21:14 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/03 16:21:21 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
[Stage 0:> (0 + 4) / 56]19/11/03 16:21:32 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/03 16:21:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
[Stage 0:> (0 + 4) / 56]19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
[Stage 0:> (0 + 4) / 56]