Question

如何在Spark中使用saveAsHadoopFile将更多实体附加到序列文件中？

例外：

org.apache.hadoop.mapred.FileAlreadyExistsException：输出目录 hdfs：// myhost：8080 / data / sequenceFile已经存在

代码示例：

final String path = ...;
while (itr.hasNext()) {
    new JavaSparkContext(spark.sparkContext())
        // itr.next returns `List<Tuple2<String, MyClass>>` where each key is unique in global scope
        .parallelizePairs(itr.next())

        // EXCEPTION: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://myhost:8080/data/sequenceFile already exists
        .saveAsHadoopFile(path, String.class, MyClass.class, SequenceFileOutputFormat.class);
}

Answer 1

对于解决方法，您可以在temp Dir中写入数据并将文件移动/重命名为所需的目录粗糙的苏多将是

Outpath =/abc/data/
Tempoutpath = Outpath +"/"+timestamp
move (Tempoutpath/* ,Outpath  )

Spark附加到序列文件中

1 个答案: