在mapreduce中,我想从程序日志中找到映射器和缩减器的数量。
作为输入,我将三个文件传递给程序,并明确将reducer的数量设置为5(仅用于测试目的)。
计划:
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
//System.out.println(key + " " + val.get());
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setNumReduceTasks(5);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
记录:
2015-11-08 11:40:48,749 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1384)) - Job job_local1769091332_0001 completed successfully
2015-11-08 11:40:48,829 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1391)) - Counters: 38
File System Counters
FILE: Number of bytes read=20931
FILE: Number of bytes written=2179872
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1287
HDFS: Number of bytes written=194
HDFS: Number of read operations=119
HDFS: Number of large read operations=0
HDFS: Number of write operations=38
Map-Reduce Framework
Map input records=14
Map output records=35
Map output bytes=319
Map output materialized bytes=479
Input split bytes=353
Combine input records=0
Combine output records=0
Reduce input groups=12
Reduce shuffle bytes=479
Reduce input records=35
Reduce output records=12
Spilled Records=70
Shuffled Maps =15
Failed Shuffles=0
Merged Map outputs=15
GC time elapsed (ms)=272
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1578663936
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=183
File Output Format Counters
Bytes Written=86
答案 0 :(得分:0)
在输出日志中,您可以根据输入大小(等于映射器数量)查看作业何时开始拆分数。
对于减速器,以下是获取数字的方法: 1.映射器输出中的多个唯一键。 2.生成的reducer输出文件数量 3.从网络界面也可以获得数字。
从输出日志中,您可以获得内部细节,例如每个阶段的记录数,组合器,读写操作,字节等
答案 1 :(得分:0)
在hadoop 2.0中有两个预定义的作业计数器:
TOTAL_LAUNCHED_MAPS : The number of map tasks that were launched.
TOTAL_LAUNCHED_REDUCES : The number of reduce tasks that were launched.
在Hadoop CLI中,键入mapred job -counter <job_id>
应该打印出上述计数器,这些计数器描述为作业启动的map和reduce任务的数量。