Question

我是MapReduce的新手，我对Mapper类和Reducer类设计上的代码有一些疑问

我熟悉MapReduce中的Map Side Joining，我学会了这个：

        int x = (ClientRectangle.Width / 2) - (width / 2);
        int y = (ClientRectangle.Height / 2) - (height / 2);

在上面的代码片段中，我了解到我们将类扩展为public static class CustsMapper extends Mapper<Object, Text, Text, Text> { public void map(Object key, Text value, Context context) throws IOException, InterruptedException {类，因为Mapper是一个键，Object是一个值，所以map方法接受此键-value作为Text对象的输入在这里像这样根据代码的逻辑主体设计输出context。

我的两个问题是：

为什么我们将课程扩展到context.write(new Text(), new Text())（它做了什么？）以及为什么我们将课程实现到MapReduceBase（我知道它是一个班级，但在它在某处显示为界面，所以如果我将其扩展到 Mapper 上课有什么问题？
在org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>函数中map是什么？我不知道吗？我知道OutputCollector<Text, IntWritable> output, Reporter reporter应该在这里，但Context context和OutputCollector是什么这里吗？

我正在执行以下计划：

输入：

Reporter

代码：

1979   23   23   2   43   24   25   26   26   26   26   25   26  25 
1980   26   27   28  28   28   30   31   31   31   30   30   30  29 
1981   31   32   32  32   33   34   35   36   36   34   34   34  34 
1984   39   38   39  39   39   41   42   43   40   39   38   38  40 
1985   38   39   39  39   39   41   41   41   00   40   39   39  45

输出：

package hadoop; 

import java.util.*; 

import java.io.IOException; 
import java.io.IOException; 

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.util.*; 

public class ProcessUnits 
{ 
   //Mapper class 
   public static class E_EMapper extends MapReduceBase implements Mapper<LongWritable ,Text,Text,IntWritable>       
   {       
      //Map function 
      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException 
      { 
         String line = value.toString(); 
         String lasttoken = null; 
         StringTokenizer s = new StringTokenizer(line,"\t"); 
         String year = s.nextToken(); 

         while(s.hasMoreTokens())
            {
               lasttoken=s.nextToken();
            } 

         int avgprice = Integer.parseInt(lasttoken); 
         output.collect(new Text(year), new IntWritable(avgprice)); 
      } 
   } 


   //Reducer class 
   public static class E_EReduce extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> 
   {     
      //Reduce function 
      public void reduce( Text key, Iterator <IntWritable> values, 
         OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException 
         { 
            int maxavg=30; 
            int val=Integer.MIN_VALUE; 

            while (values.hasNext()) 
            { 
               if((val=values.next().get())>maxavg) 
               { 
                  output.collect(key, new IntWritable(val)); 
               } 
            } 

         } 
   }  


   //Main function 
   public static void main(String args[])throws Exception 
   { 
      JobConf conf = new JobConf(ProcessUnits.class); 

      conf.setJobName("max_eletricityunits"); 
      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class); 
      conf.setMapperClass(E_EMapper.class); 
      conf.setCombinerClass(E_EReduce.class); 
      conf.setReducerClass(E_EReduce.class); 
      conf.setInputFormat(TextInputFormat.class); 
      conf.setOutputFormat(TextOutputFormat.class); 

      FileInputFormat.setInputPaths(conf, new Path(args[0])); 
      FileOutputFormat.setOutputPath(conf, new Path(args[1])); 

      JobClient.runJob(conf); 
   } 
}

Answer 1

为什么我们将类扩展到MapReduceBase（它做什么？）以及为什么我们将类实现为Mapper

因为在Hadoop 2.x之前使用mapred API编写的旧代码就是这样。

我知道Context上下文应该在这里，但OutputCollector和Reporter在这里

这是Context对象的先前版本。

Hadoop: How does OutputCollector work during MapReduce?
How outputcollector works?

MapReduce设计模式中的Mapper类和Reducer类

1 个答案: