map()函数的调用时间与MR作业中发出的映射任务数之间的连接

时间:2012-03-26 09:51:08

标签: hadoop mapreduce

我写了一个MR程序来评估PI(3.141592 .........)如下所示,但我提出了一个问题:

框架发出的地图任务数量是11,以下是输出(总共35行)。但我预计输出为11行。我有什么想念吗?

INCIRCLE 78534096 INCIRCLE 78539304 INCIRCLE 78540871 INCIRCLE 78537925 INCIRCLE 78537161 INCIRCLE 78544419 INCIRCLE 78537045 INCIRCLE 78534861 INCIRCLE 78545779 INCIRCLE 78528890 INCIRCLE 78540007 INCIRCLE 78542686 INCIRCLE 78534539 INCIRCLE 78538255 INCIRCLE 78543392 INCIRCLE 78543191 INCIRCLE 78540938 INCIRCLE 78534882 INCIRCLE 78536155 INCIRCLE 78545739 INCIRCLE 78541807 INCIRCLE 78540635 INCIRCLE 78547561 INCIRCLE 78540521 INCIRCLE 78541320 INCIRCLE 78537605 INCIRCLE 78541379 INCIRCLE 78540408 INCIRCLE 78536238 INCIRCLE 78539614 INCIRCLE 78539773 INCIRCLE 78537169 INCIRCLE 78541707 INCIRCLE 78537141 INCIRCLE 78538045

// porgramme开始 导入......

公共课程PiEstimation {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {

            private final static Text  INCIRCLE             = new Text("INCIRCLE");
            private final static LongWritable TimesInAMap   = new LongWritable(100000000);
            private static Random random = new Random();

            public  class MyPoint {
                    private double  x = 0.0;
                    private double  y = 0.0;

                    MyPoint(double _x,double _y) {
                            this.x = _x;
                            this.y = _y;
                    }

                    public boolean inCircle() {
                            if ( ((x-0.5)*(x-0.5) + (y-0.5)*(y-0.5)) <= 0.25 )
                                    return true;
                            else
                                    return false;
                    }

                    public void setPoint(double _x,double _y) {
                            this.x = _x;
                            this.y = _y;
                    }
            }
            public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
                            long i = 0;
                            long N = TimesInAMap.get();
                            MyPoint myPoint = new MyPoint(random.nextDouble(),random.nextDouble());
                            long sum = 0;
                            while (i < N ) {
                            if (myPoint.inCircle()) {                                           
                                sum++;
                            }
                            myPoint.setPoint(random.nextDouble(),random.nextDouble());
                            i++;
                            }
                            output.collect(INCIRCLE, new LongWritable(sum));
                            }
            }


    public static class Reduce extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable> {
  public void reduce(Text key, Iterator<LongWritable> values, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
      long sum = 0;
      while (values.hasNext()) {
        //sum += values.next().get();
        output.collect(key, values.next());
      }
      //output.collect(key, new LongWritable(sum));
  }
  }
    public static void main(String[] args) throws Exception {
  JobConf conf = new JobConf(PiEstimation.class);
  conf.setJobName("PiEstimation");

  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(LongWritable.class);

  conf.setMapperClass(Map.class);
  conf.setCombinerClass(Reduce.class);
  conf.setReducerClass(Reduce.class);

  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);
  conf.setNumMapTasks(10);
  conf.setNumReduceTasks(1);
  FileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));

  JobClient.runJob(conf);
}

}

1 个答案:

答案 0 :(得分:3)

启动的地图任务数量由多种因素决定 - 主要是输入格式,相关的块大小以将输入文件分块,以及输入文件本身是否为“可拆分”

另外,调用map的次数取决于每个map拆分中的记录数(mapper正在处理的数据)。

假设您有一个100行文本文件用于输入 - 很可能这将由单个Mapper处理,但map方法被调用100次 - 输入文件中的每一行一次

如果计算输入文件中的行数 - 也就是在所有Mappers中调用map的次数。很难准确确定在每个Mapper中调用map的次数。