我需要从我的reducer中的mapper访问计数器。这可能吗?如果是这样,它是如何完成的?
举个例子: 我的映射器是:
public class CounterMapper extends Mapper<Text,Text,Text,Text> {
static enum TestCounters { TEST }
@Override
protected void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
context.getCounter(TestCounters.TEST).increment(1);
context.write(key, value);
}
}
我的减速机是
public class CounterReducer extends Reducer<Text,Text,Text,LongWritable> {
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Counter counter = context.getCounter(CounterMapper.TestCounters.TEST);
long counterValue = counter.getValue();
context.write(key, new LongWritable(counterValue));
}
}
counterValue始终为0。 我做错了什么或者这是不可能的?
答案 0 :(得分:11)
在Reducer的configure(JobConf)中,您可以使用JobConf对象来查找reducer自己的作业ID。这样,您的reducer可以创建自己的JobClient - 即与jobtracker的连接 - 并查询该作业的计数器(或任何相关的工作)。
// in the Reducer class...
private long mapperCounter;
@Override
public void configure(JobConf conf) {
JobClient client = new JobClient(conf);
RunningJob parentJob =
client.getJob(JobID.forName( conf.get("mapred.job.id") ));
mapperCounter = parentJob.getCounters().getCounter(MAP_COUNTER_NAME);
}
现在你可以在reduce()方法本身中使用mapperCounter。
你真的需要一个试试看。我正在使用旧的API,但不应该很难适应新的API。
请注意,映射器的计数器应该在任何减速器开始之前完成,所以与Justin Thomas的评论相反,我相信你应该得到准确的值(只要减速器没有递增相同的计数器!)
答案 1 :(得分:8)
在新API上实施了Jeff G的解决方案:
@Override
public void setup(Context context) throws IOException, InterruptedException{
Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
mapperCounter = currentJob.getCounters().findCounter(COUNTER_NAME).getValue();
}
答案 2 :(得分:2)
map / reduce的重点是并行化作业。将会有许多独特的映射器/缩减器,因此除了map / reduce对的运行之外,该值无论如何都不正确。
他们有一个单词计数示例:
http://wiki.apache.org/hadoop/WordCount
您可以将context.write(word,one)更改为context.write(line,one)
答案 3 :(得分:1)
全局计数器值永远不会广播回每个映射器或缩减器。如果您希望映射器记录的#可用于reducer,则需要依赖一些外部机制来执行此操作。
答案 4 :(得分:1)
我问过this question,但我还没有解决我的问题。但是,我想到了另一种解决方案。在映射器中,对单词的数量进行计数,并且可以在运行映射器末尾的清除函数中使用最小键(因此该值在头部中)写入中间输出。在reducer中,通过在head中添加值来计算单词数。下面提供了示例代码及其输出的一部分。
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
import java.util.StringTokenizer;
/**
* Created by tolga on 1/26/16.
*/
public class WordCount {
static enum TestCounters { TEST }
public static class Map extends Mapper<Object, Text, Text, LongWritable> {
private final static LongWritable one = new LongWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
context.getCounter(TestCounters.TEST).increment(1);
}
}
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new Text("!"),new LongWritable(context.getCounter(TestCounters.TEST).getValue()));
}
}
public static class Reduce extends Reducer<Text, LongWritable, Text, LongWritable> {
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (LongWritable val : values) {
sum += val.get();
}
context.write(key, new LongWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
文字档案:
Turgut Özal University is a private university located in Ankara, Turkey. It was established in 2008 by the Turgut Özal Thought and Action Foundation and is named after former Turkish president Turgut Özal.
中级输出
**! 33**
2008 1
Action 1
Ankara, 1
Foundation 1
It 1
Thought 1
Turgut 1
Turgut 1
Turgut 1
&#13;
**! 33**
2008 1
Action 1
Ankara, 1
Foundation 1
It 1
Thought 1
Turgut 3
&#13;
答案 5 :(得分:0)
itzhaki的回答改进
findCounter(COUNTER_NAME)
不再受支持 - https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapred/Counters.html
@Override
public void setup(Context context) throws IOException, InterruptedException{
Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
mapperCounter = currentJob.getCounters().findCounter(GROUP_NAME, COUNTER_NAME).getValue();
}
在调用计数器时指定 GROUP_NAME
。 e.g。
context.getCounter("com.example.mycode", "MY_COUNTER").increment(1);
然后
mapperCounter = currentJob.getCounters().findCounter("com.example.mycode", "MY_COUNTER").getValue();
另外,重要的一点是,如果计数器不存在,它将初始化一个值为0的。