在工作日志中找不到减速器

时间:2015-11-08 20:09:37

标签: hadoop mapreduce

在mapreduce中,我想从程序日志中找到映射器和缩减器的数量。

作为输入,我将三个文件传递给程序,并明确将reducer的数量设置为5(仅用于测试目的)。

计划:

public class WordCount {

 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
 } 

 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            //System.out.println(key + "  " + val.get());
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }

 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

        Job job = new Job(conf, "wordcount");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setNumReduceTasks(5);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
 }

}

记录:

2015-11-08 11:40:48,749 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1384)) - Job job_local1769091332_0001 completed successfully
2015-11-08 11:40:48,829 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1391)) - Counters: 38
    File System Counters
        FILE: Number of bytes read=20931
        FILE: Number of bytes written=2179872
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1287
        HDFS: Number of bytes written=194
        HDFS: Number of read operations=119
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=38
    Map-Reduce Framework
        Map input records=14
        Map output records=35
        Map output bytes=319
        Map output materialized bytes=479
        Input split bytes=353
        Combine input records=0
        Combine output records=0
        Reduce input groups=12
        Reduce shuffle bytes=479
        Reduce input records=35
        Reduce output records=12
        Spilled Records=70
        Shuffled Maps =15
        Failed Shuffles=0
        Merged Map outputs=15
        GC time elapsed (ms)=272
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=1578663936
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=183
    File Output Format Counters 
        Bytes Written=86

2 个答案:

答案 0 :(得分:0)

在输出日志中,您可以根据输入大小(等于映射器数量)查看作业何时开始拆分数。

对于减速器,以下是获取数字的方法:  1.映射器输出中的多个唯一键。  2.生成的reducer输出文件数量  3.从网络界面也可以获得数字。

从输出日志中,您可以获得内部细节,例如每个阶段的记录数,组合器,读写操作,字节等

答案 1 :(得分:0)

在hadoop 2.0中有两个预定义的作业计数器:

TOTAL_LAUNCHED_MAPS : The number of map tasks that were launched.
TOTAL_LAUNCHED_REDUCES : The number of reduce tasks that were launched.

在Hadoop CLI中,键入mapred job -counter <job_id>应该打印出上述计数器,这些计数器描述为作业启动的map和reduce任务的数量。