Question

我有一组数据正根据我的要求使用spark进行扩展。扩展数据（在循环中运行）将写入HDFS中的文本文件中，并且在执行此操作时我遇到了问题。只写了前两个记录，然后它就失败了。

以下是我用来处理和写入文本文件的代码：



         def Hdfswrite(record:String)
        {
        //val timestamp = new java.text.SimpleDateFormat("yyyyMMdd-HH").format(new java.util.Date())
        val file = "/user/bhkp/sparkoutput6/sparkoutput12" + ".txt"
        val line = "\n" + record
        val config = new Configuration()
        val fs = FileSystem.get(config)
try
        {
        val writer = fs.append(new Path(file))
        writer.write(line.getBytes)
        writer.close
        }
        catch
        {
          case t: Throwable => { val writer = fs.create(new Path(file))
          writer.write(line.getBytes)
          writer.close
          }
        }
        }
  def main(args: Array[String]){
    val tokenized = sc.textFile("/user/bhkp/hv_tables/sparktestdata12.txt").map(rec => (rec.split("\\^",-1)))
    tokenize.foreach(transfer)

  }

我收到以下错误： org.apache.hadoop.ipc.RemoteException（java.io.FileNotFoundException）：ID不匹配。请求ID和已保存的ID。

我不确定问题出在哪里，因为当输入文件只有2条记录时数据被写入，但当记录数为3或更多时开始失败。

当我尝试打印输出时，它的效果非常好。

我很困惑。

使用spark-submit将数据写入文本文件不起作用

0 个答案: