Question

我正在尝试实现map reduce程序，以便输出是.txt文件的对角线。例如，阅读文件

a*****
*b****
**c***
***d**
****e*
*****f

我希望输出为 abcdef 。

我写的映射器类是这个：

public class MapperClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
{
//hadoop supported data types
private static final Text t = new Text("");
private Text word = new Text();
//private static int linenumber = 0;

  public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
  {
        //taking one line at a time from input file
        String line = value.toString();
        int linenumber = 0; 
        word.set(Character.toString(line.charAt(linenumber++)));
        output.collect(word, t);
   }
}

但我得到的输出是

a
*
*
*
*
*

我试图将行号放在map方法之外但仍然得到相同的结果。有人可以帮忙吗？我只需要找到一种方法来保持计数器，当我从文件中读取下一行时，计数器会增加。 P.S。我认为这里不需要减速器，因为我不想对任何中间结果进行排序。如果我错了，请纠正我。的谢谢！

Answer 1

使用已提供给LongWritable key方法的map参数，并指向已处理文件中的行号。

通常，您无法跟踪映射器中的linenumber，因为文件可能由多个映射器处理（特别是如果您使用TextInputFormat假定常规文本文件是splittable）。这种全球状态通常只在柜台才有意义。

Answer 2

在静态范围内启用private static int linenumber = 0;行。

并在mapper方法中评论int linenumber = 0;行。

是的，当然，您根本不需要减速机。

Answer 3

实际上你没有在那里使用任何循环，因此它在第一行中遍历。试试这个

public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
    //taking one line at a time from input file
    String line = value.toString();
    StringTokenizer itr = new StringTokenizer(line.toLowerCase());
    int linenumber = 0; 
    while(itr.hasMoreTokens()) {

    word.set(Character.toString(line.charAt(linenumber++)));
    output.collect(word, t);
    }
}

希望它能运作

Answer 4

不是所有行都一起在map函数中执行。他们逐行执行。第一次使用linenumber ++时，它将给您'a'，但下次使用linenumber时也将其设置为0，因此会将'*'发送给reducer函数。对这些类型的问题使用上下文计数器。

MapReduce程序在读取文本文件时保持计数器

4 个答案: