由于删除失败,MapReduce作业失败

时间:2015-04-13 20:34:18

标签: hadoop mapreduce

我有一个mapreduce工作,只包含映射阶段。映射器使用MultipleOutputs输出多个输出文件。它需要一个avro文件作为输入,并且应该发出多个表单的序列文件。它运行得很好,映射器上升到100%,但由于以下问题,它最终失败。

 15/04/13 12:16:33 INFO mapred.Task: Task attempt_local1841417422_0001_m_000000_0 is allowed to commit now
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\.seq_gg0-m-00000.crc]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\.seq_gg1-m-00000.crc]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\seq_gg0-m-00000]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\seq_gg1-m-00000]: it still exists.
15/04/13 12:16:33 WARN mapred.Task: Failure committing: java.io.IOException: Could not rename file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/_temporary/attempt_local1841417422_0001_m_000000_0 to file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/task_local1841417422_0001_m_000000
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436)
at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:311)
at org.apache.hadoop.mapred.Task.commit(Task.java:1163)
at org.apache.hadoop.mapred.Task.done(Task.java:1025)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\.seq_gg0-m-00000.crc]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\.seq_gg1-m-00000.crc]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\seq_gg0-m-00000]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\seq_gg1-m-00000]: it still exists.
15/04/13 12:16:33 WARN output.FileOutputCommitter: Could not delete file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/_temporary/attempt_local1841417422_0001_m_000000_0
15/04/13 12:16:33 INFO mapred.LocalJobRunner: map task executor complete.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\.seq_gg0-m-00000.crc]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\.seq_gg1-m-00000.crc]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\seq_gg0-m-00000]: it still exists.
15/04/13 12:16:33 WARN fs.FileUtil: Failed to delete file or dir [C:\Workspace\AvroProcessingMapReduce\out\_temporary\0\_temporary\attempt_local1841417422_0001_m_000000_0\seq_gg1-m-00000]: it still exists.
15/04/13 12:16:33 WARN mapred.LocalJobRunner: job_local1841417422_0001
java.lang.Exception: java.io.IOException: Could not rename file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/_temporary/attempt_local1841417422_0001_m_000000_0 to file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/task_local1841417422_0001_m_000000
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Could not rename file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/_temporary/attempt_local1841417422_0001_m_000000_0 to file:/C:/Workspace/AvroProcessingMapReduce/out/_temporary/0/task_local1841417422_0001_m_000000
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436)
at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:311)
at org.apache.hadoop.mapred.Task.commit(Task.java:1163)
at org.apache.hadoop.mapred.Task.done(Task.java:1025)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

以下是我的mapper的代码:

 public class AvroReaderMapper extends MapReduceBase implements Mapper<AvroWrapper<ContentPackage>,NullWritable, Text, Text> {

private final static int BUFFER_SIZE = 20;
private MultipleOutputs multipleOutputs;//this is used to output multiple files from the mapper
private List<ContentPackage> buffer = new ArrayList<ContentPackage>(); //it stores BUFFER_SIZE records and output them at once
private int fileCounter = 0 ; //keep track of number of output files

@Override
public void configure(JobConf job){

    multipleOutputs = new MultipleOutputs(job);
}

@Override
public void map(AvroWrapper<ContentPackage> record, NullWritable v,OutputCollector<Text, Text> collector, Reporter reporter) throws IOException {

    //buffer until the size reaches BUFFER_SIZE
    buffer.add(record.datum());             

    if(buffer.size() == BUFFER_SIZE){           

        for(int i=0;i<BUFFER_SIZE;i++){
            ContentItem doc = (ContentItem)buffer.get(i).Content;


            String content = //some processing
            multipleOutputs.getCollector("seq","test"+fileCounter, reporter).collect(new Text(content), new Text(record.toString()));


        }
        buffer.clear();
        fileCounter++;
    }       
}

和驱动程序:

public static void main(String[] args) throws Exception {

//Job configuration
    JobConf conf = new JobConf(SeqFileGeneratorDriver.class);
    conf.setJobName("Sequence File Generator");

    //1-set the input and output path
    FileInputFormat.setInputPaths(conf, new Path("in"));
    FileOutputFormat.setOutputPath(conf, new Path("out"));

    //2-set the mapper and reducer class        
    conf.setMapperClass(AvroReaderMapper.class);
    conf.setNumReduceTasks(0);

    //3-set the input/output format
    AvroJob.setInputSchema(conf, ContentPackage.SCHEMA$);       
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(Text.class);
    //AvroJob.setOutputSchema(conf, Pair.getPairSchema(Schema.create(Type.STRING),Schema.create(Type.STRING)));
    conf.setOutputFormat(SequenceFileOutputFormat.class);   
    MultipleOutputs.addMultiNamedOutput(conf, "seq", SequenceFileOutputFormat.class, Text.class, Text.class);
    //conf.setCompressMapOutput(true);

    //4-run the job
    JobClient.runJob(conf);

}

临时输出文件看起来很好,看起来映射器运行正常。

P.S。我在Eclipse中运行它,这就是地址是本地的原因。

0 个答案:

没有答案