Question

当尝试在Zeppelin中从Spark写入HDFS时，我收到org.apache.hadoop.mapred.DirectFileOutputCommitter的{{1}}：

java.lang.RuntimeException: java.lang.RuntimeException:    java.lang.ClassNotFoundException: Class org.apache.hadoop.mapred.DirectFileOutputCommitter not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2106)
at org.apache.hadoop.mapred.JobConf.getOutputCommitter(JobConf.java:725)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:983)

正在尝试运行的代码：

val model = LinearRegressionWithSGD.train(someRDD, numIterations)
val modelPath = "hdfs:///some_path/LinearRegressionWithSGD"
model.save(sc, modelPath)

在搜索这个课程时，我甚至找不到它。我能找到的最近的是Why is Arrays.fill() not used in HashMap.clear() anymore?。

我正在使用org.apache.hadoop.mapred.FileOutputCommitter in Hadoop的提交18c8c9ea512a0d87699a73e2ca26192d03748661（10月9日），YARN的Spark 1.5.0和Hadoop 2.6。

Answer 1

我遇到了同样的问题。在“hadoop-mapreduce-client-core.X.X.X.jar”中查找该文件，但在jar中找不到该文件。

我通过将org.apache.hadoop.mapred.DirectFileOutputCommitter添加到我的存储库来解决了这个问题。该文件的来源可在此处找到：https://gist.github.com/apivovarov

还不确定这个问题的根本原因是什么。深入研究它。一旦得到答案，我会在这里更新。

尝试从Zeppelin将文件写入HDFS时出现异常

1 个答案: