从标题中可以明显看出,我的目标是在完成特定工作之前在缩减阶段使用Mapper计数器。
我遇到了一些与这个问题高度相关的问题,但没有一个问题解决了我的所有问题。 (Accessing a mapper's counter from a reducer, Hadoop, MapReduce Custom Java Counters Exception in thread "main" java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING, 等)
@Override
public void setup(Context context) throws IOException, InterruptedException{
Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
mapperCounter = currentJob.getCounters().findCounter(COUNTER_NAME).getValue();
}
我的问题是群集不包含任何作业历史记录。
我如何调用mapreduce作业:
private void firstFrequents(String outpath) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Cluster cluster = new Cluster(conf);
conf.setInt("minFreq", MIN_FREQUENCY);
Job job = Job.getInstance(conf, "APR");
// Counters counters = job.getCounters();
job.setJobName("TotalTransactions");
job.setJarByClass(AssociationRules.class);
job.setMapperClass(FirstFrequentsMapper.class);
job.setReducerClass(CandidateReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path(outpath));
job.waitForCompletion(true);
}
映射器:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class FirstFrequentsMapper extends
Mapper<Object, Text, Text, IntWritable> {
public enum Counters {
TotalTransactions
}
private IntWritable one = new IntWritable(1);
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String[] line = value.toString().split("\\\t+|,+");
int iter = 0;
for (String string : line) {
context.write(new Text(line[iter]), one);
iter++;
}
context.getCounter(Counters.TotalTransactions).increment(1);
}
}
减速
public class CandidateReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
private int minFrequency;
private long totalTransactions;
@Override
public void setup(Context context) throws IOException, InterruptedException{
Configuration conf = context.getConfiguration();
minFrequency = conf.getInt("minFreq", 1);
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
totalTransactions = currentJob.getCounters().findCounter(FirstFrequentsMapper.Counters.TotalTransactions).getValue();
System.out.print(totalTransactions);
}
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
int counter = 0;
for (IntWritable val : values) {
counter+=val.get();
}
/* Item frequency calculated*/
/* Write it to output if it is frequent */
if (counter>= minFrequency) {
context.write(key,new IntWritable(counter));
}
}
}
答案 0 :(得分:0)
获取计数器值的正确setup()
或reduce()
实现正好是the post that you mention中显示的那个:
Counter counter = context.getCounter(CounterMapper.TestCounters.TEST);
long counterValue = counter.getValue();
其中TEST
是计数器的名称,在枚举TestCounters
中声明。
我没有看到您声明Cluster
变量的原因......
此外,在您在评论中提到的代码中,您应该将getValue()
方法的返回结果存储在变量中,作为上面的counterValue
变量。
也许,你会发现this post也很有用。
更新:根据您的修改,我相信您所需要的只是MAP_INPUT_RECORDS的数量,这是默认计数器,因此您无需重新实施。
要从Driver类中获取计数器的值,您可以使用(取自this post):
job.getCounters().findCounter(COUNTER_NAME).getValue();