Question

我想创建一个mapper，它只处理输入文件中的前k行。我发现了这篇文章：

Hadoop-> Mapper->How can we read only Top N rows from each file from given input path? 它表示覆盖run方法如下：

@Override
public void run(Context context) throws IOException, InterruptedException {
  setup(context);

  int rows = 0;
  while (context.nextKeyValue()) {
    if (rows++ == 10) {
      break;
    }

    map(context.getCurrentKey(), context.getCurrentValue(), context);
  }

  cleanup(context);
}

所以我尝试了解决方案但编译器无法找到“Context”和“setup（）”我尝试导入org.apache.hadoop.mapreduce.Mapper。*但不起作用谁也可以解释map（）函数中的参数？

示例代码

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> 
{
    @Override
    public void run(Context context) throws IOException, InterruptedException //for reading the first k lines only
    {
        setup(context);

        int k = 5;

        int rows = 0;
        while (context.nextKeyValue()) 
        {
            if (rows++ == k)        break;
                map(context.getCurrentKey(), context.getCurrentValue(), context);
        }

        cleanup(context);
    }

}

Answer 1

您没有扩展Mapper类。

对于例如：

应该是这样的

MyMapper扩展了Mapper＆lt; Object，Text，Text，IntWritable＆gt;

在此处查看示例： http://wiki.apache.org/hadoop/WordCount
http://hadoop.apache.org/docs/stable/api/src-html/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html（这也有覆盖运行部分）

澄清之前的帖子（处理来自输入文件的前N行）

1 个答案: