Question

简单问题：对于以下RDD，我想打印出一个输出文本文件，其格式和标题如下（UserID，MovieID，Pred_rating）

scala> final_predictions_adjusted.sortByKey().first
res61: ((Int, Int), Double) = ((1,1172),1.8697903970770442)

足够简单。对？所以我正在使用这个功能：

  def print_outputfile(final_predictions_adjusted:RDD[((Int, Int), Double)])={
    val writer = new FileWriter(new File("output.txt" ))
    writer.write("UserID,MovieID,Pred_rating")
    final_predictions_adjusted.sortByKey().foreach(x=>{writer.write(x.toString())})
    writer.close()
  }

上述功能无法使用以下错误

caused by: java.io.NotSerializableException: java.io.FileWrite

Answer 1

使用您的代码，FileWriter对象将被发送到所有节点并并行执行，这对本地文件的引用不起作用。因此，您将获得NotSerializableException。

您通常会通过saveAsTextFile将RDD保存到文件中：

final_predictions_adjusted.sortByKey().map(e=> (e._1._1,e._1._2,e._2)).saveAsTextFile("output.dir")

这会将文件分成多个部分。您可以添加标题并稍后手动合并这些部分。

Answer 2

这就像甜蜜的领主一样：

  def print_outputfile(final_predictions_adjusted:RDD[((Int, Int), Double)])={
    val writer = new FileWriter(new File("output.txt" ))
    writer.write("UserID,MovieID,Pred_rating\n")
    final_predictions_adjusted.sortByKey().collect().foreach(x=>{writer.write(x._1._1+","+x._1._2+","+x._2+"\n")})
    writer.close()
  }

使用标题将RDD打印到文本文件

2 个答案: