如何在Spark中使用saveAsHadoopFile
将更多实体附加到序列文件中?
例外:
org.apache.hadoop.mapred.FileAlreadyExistsException:输出目录 hdfs:// myhost:8080 / data / sequenceFile已经存在
代码示例:
final String path = ...;
while (itr.hasNext()) {
new JavaSparkContext(spark.sparkContext())
// itr.next returns `List<Tuple2<String, MyClass>>` where each key is unique in global scope
.parallelizePairs(itr.next())
// EXCEPTION: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://myhost:8080/data/sequenceFile already exists
.saveAsHadoopFile(path, String.class, MyClass.class, SequenceFileOutputFormat.class);
}
答案 0 :(得分:0)
对于解决方法,您可以在temp Dir中写入数据并将文件移动/重命名为所需的目录 粗糙的苏多将是
Outpath =/abc/data/
Tempoutpath = Outpath +"/"+timestamp
move (Tempoutpath/* ,Outpath )