Spark附加到序列文件中

时间:2017-05-11 09:48:05

标签: java apache-spark

如何在Spark中使用saveAsHadoopFile将更多实体附加到序列文件中?

例外:

  

org.apache.hadoop.mapred.FileAlreadyExistsException:输出目录   hdfs:// myhost:8080 / data / sequenceFile已经存在

代码示例:

final String path = ...;
while (itr.hasNext()) {
    new JavaSparkContext(spark.sparkContext())
        // itr.next returns `List<Tuple2<String, MyClass>>` where each key is unique in global scope
        .parallelizePairs(itr.next())

        // EXCEPTION: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://myhost:8080/data/sequenceFile already exists
        .saveAsHadoopFile(path, String.class, MyClass.class, SequenceFileOutputFormat.class);
}

1 个答案:

答案 0 :(得分:0)

对于解决方法,您可以在temp Dir中写入数据并将文件移动/重命名为所需的目录 粗糙的苏多将是

Outpath =/abc/data/
Tempoutpath = Outpath +"/"+timestamp
move (Tempoutpath/* ,Outpath  )