Question

我正在修改普通字数统计程序，该程序对每个单词进行计数，使其仅计算特定单词。

reducer和map类与普通字数相同。没有正确计算字数。我在文件中多次出现相同的特定单词，但是将其作为计数。

public class wordcountmapper extends MapReduceBase implements Mapper<LongWritable, Tex, Text, IntWritable>                       // mapper function implemented.
{
    private final static IntWritable one = new IntWritable(1); // intwritable
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();      // conversion in string
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            if (line.compareTo("Cold") == 0) {  //cold is the specific word to get count for
                output.collect(word, one);      // getting 1 as a count for 'cold' as if its counting only first line 'cold' and not going to next line.
            }
        }
    }
}

Answer 1

首先，您的if statement正在将线对象与＆＃34; Cold＆＃34;进行比较。这是错的。它应该将标记化的单词与＆＃34; Cold＆＃34;进行比较。 if(tokenizer.nextToken().equals("Cold"))。

我不确定当前的逻辑是如何得到＆＃34; Cold＆＃34;因为1.可能在你的输入中你有一行单词和＆＃34;冷＆＃34;。

如何使用mapreduce计算特定单词？

1 个答案: