Question

我正在使用Hadoop 2.2.0，当我运行我的地图任务时，我收到以下错误

attempt_xxx 1800000秒后超时

（1800000因为我更改了mapreduce.task.timeout的配置）。

下面是我的地图代码：

public class MapTask
{
 ContentOfFiles fileContent= new ContentOfFiles();
 @Override
 public void map(LongWritable key, Text value, Context context)
 {
   String line = value.toString(); 
   String splits[] = line.split("\\t");
   List<String> sourceList = Arrays.aslist(splits);
   String finalOutput = fileContent.getContentOfFile(sourceList);
   context.write(NullWritable.get, new Text(finalOutput));  
 }
}

这是我的ContentOfFiles类

public class ContentOFFiles
{
  public String getContentOfFile(List<String>sourceList)
   {
     String returnContentOfFile;
     for(List sourceList:sourceLists)
      {
        //Open the files and get the content and then append it to the String returnContentOfFile
      }
    return returnContentOfFile;
   }
}

当我运行我的地图任务时，我收到错误说

尝试_xxx在1800000秒后计时。

我想知道的是，我怎么能告诉hadoop我的任务仍在运行。

我在地图中调用ContentOfFiles类。那么有没有办法告诉我的地图任务仍然在运行。我试图将配置mapreduce.task.timeout更改为1800000，它仍然给我相同的错误。

我再一次使用hadoop 2.2，如果有人能告诉我如何在新api中处理这个问题，那将会很棒。

Answer 1

您可以尝试在mapper中的每个长操作结束后添加context.progress();。据我所知，最好的地方是for周期结束：

public String getContentOfFile(List < String > sourceList, Context context) {
    String returnContentOfFile;
    for (List sourceList: sourceLists) {
        //Open the files and get the content and then append it to the String returnContentOfFile
        context.progres(); // report on progress
    }
    return returnContentOfFile;
}

Map Error- Attempy_xxxx_ 600秒后超时

1 个答案: