Question

我最近开始学习hadoop。现在，我想在本地磁盘中打开一个文件，并在reduce函数中将一些数据写入该文件，但我找不到关闭该文件的好方法。

据我所知，关闭并重新打开它并不是一个好主意，所以我不想这样做。

public class MyClass extends Configured implements Tool{
    main(){
         //all configurations here
         job.setMapperClass(MyMapper.class);
         job.setReducerClass(MyReducer.class);
    }
    static class MyMapper extends Mapper <LongWritable,Text,Text,Text>{
      //does something
    }
    static class MyReducer extends Reducer <LongWritable,Text,Text,Text>{
         //create file, filewriter etc here
         public MyReducer() {
              //open a file here
         }
         public reduce(){
              //write to file here
              bw.write("entered the reduce task for " + key); 
              while(there is more item)
                  bw.write( value + " will be written to my file \n");
         }
    }
}

工作流程将如下所示（如果我错了，请纠正我）：

for(each reduce task)
    write to file "entered the reduce task for " + *key*
        for each *value* for that *key*
            write *value*

我想将键/值对写入写在本地磁盘上的myfile，然后想要关闭该文件，但我找不到解决该问题的好方法。或者这是一个问题，如果我不关闭文件，我的意思是，hadoop会照顾它吗？

谢谢，

Answer 1

您正在扩展的mapper和reducer类都有在您处理数据之前和之后运行代码的方法。

要在地图/减少运行之前运行代码，请扩展setup(Context context)方法
要在map / reduce任务完成后运行代码，请扩展cleanup(Context context)方法

因此，在您的情况下，您可以扩展close方法以关闭文件。（您需要在reducer中将实例变量维护到开放流中。）

请注意，在reduce方法失败/异常时，不会调用close方法（除非重写reduce方法本身以捕获异常，运行close方法然后重新抛出异常）。

hadoop关闭文件写在本地磁盘上

1 个答案: