Hadoop添加数字

时间:2015-06-16 08:23:02

标签: java apache hadoop

**a 10 20 30                       a  60  
                                   b  155
                                   c  50  
  b 20 45 90
  z 30 10 10

以上同样是制表符分隔的文本文件。我需要行数的总和。输出应如上所述。

我尝试使用以下mapper和reducer代码,但它失败了。任何人都可以更正代码吗?

映射器代码:

public class WordMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{

@Override
public void map(LongWritable key, Text value,
        OutputCollector<Text, IntWritable> output, Reporter arg3)
        throws IOException {
    String s = value.toString();
        for(String word:s.split("\t")){
            if(word.length()>0){
                output.collect(new Text(word),new IntWritable(1));
    // TODO Auto-generated method stub

          }
      }
  }
}

减速机代码:

public class WordReducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,
        OutputCollector<Text, IntWritable> output, Reporter reporter)
        throws IOException {
    int sum = 0;
    while(values.hasNext(){
        if values != null{
        sum += values.next().get();
    }
    output.collect(key, new IntWritable(sum));

    }
}

2 个答案:

答案 0 :(得分:3)

您正在向减速器发送错误的键和值。这就是为什么你无法获得数字总和的原因。您必须将映射器代码更改为以下内容:

&#13;
&#13;
@Override
	public void map(LongWritable key, Text value,
			OutputCollector<Text, IntWritable> output, Reporter arg3)
			throws IOException {
		
	    	String s = value.toString();
	        String[] splits = s.split("/t");
	        String newKey = splits[0].trim();
	        for(int i=1;i<splits.length;i++) {
	        	output.collect(new Text(newKey), new IntWritable(Integer.parseInt(splits[i].trim())));
	        }
	            

	}
&#13;
&#13;
&#13;

答案 1 :(得分:0)

'reducer'中的output.collect需要在while循环之外。您的代码应该提供所需的输出。