在Reducer阶段访问Mapper的计数器(在完成工作之前)

时间:2016-05-19 16:47:25

标签: java mapreduce counter mapper reducers

从标题中可以明显看出,我的目标是在完成特定工作之前在缩减阶段使用Mapper计数器。

我遇到了一些与这个问题高度相关的问题,但没有一个问题解决了我的所有问题。 (Accessing a mapper's counter from a reducerHadoop, MapReduce Custom Java Counters Exception in thread "main" java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING, 等)

    @Override
public void setup(Context context) throws IOException, InterruptedException{
    Configuration conf = context.getConfiguration();
    Cluster cluster = new Cluster(conf);
    Job currentJob = cluster.getJob(context.getJobID());
    mapperCounter = currentJob.getCounters().findCounter(COUNTER_NAME).getValue();  
}

我的问题是群集不包含任何作业历史记录。

我如何调用mapreduce作业:

private void firstFrequents(String outpath) throws IOException,
        InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        Cluster cluster = new Cluster(conf);
        conf.setInt("minFreq", MIN_FREQUENCY);
        Job job = Job.getInstance(conf, "APR");
        // Counters counters = job.getCounters();
        job.setJobName("TotalTransactions");
        job.setJarByClass(AssociationRules.class);
        job.setMapperClass(FirstFrequentsMapper.class);
        job.setReducerClass(CandidateReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path("input"));
        FileOutputFormat.setOutputPath(job, new Path(outpath));


        job.waitForCompletion(true);
    }

映射器:

    import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FirstFrequentsMapper extends
        Mapper<Object, Text, Text, IntWritable> {
    public enum Counters {
        TotalTransactions
    }

    private IntWritable one = new IntWritable(1);

    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        String[] line = value.toString().split("\\\t+|,+");
        int iter = 0;
        for (String string : line) {
            context.write(new Text(line[iter]), one);
            iter++;
        }
        context.getCounter(Counters.TotalTransactions).increment(1);

    }
    }

减速

    public class CandidateReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
    private int minFrequency;
    private long totalTransactions;

    @Override
    public void setup(Context context) throws IOException, InterruptedException{
        Configuration conf = context.getConfiguration();
        minFrequency = conf.getInt("minFreq", 1);    
       Cluster cluster = new Cluster(conf);
        Job currentJob = cluster.getJob(context.getJobID());
        totalTransactions = currentJob.getCounters().findCounter(FirstFrequentsMapper.Counters.TotalTransactions).getValue();  
        System.out.print(totalTransactions);
    }


    public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
        int counter = 0;
        for (IntWritable val : values) {
            counter+=val.get();
        }

        /* Item frequency calculated*/
        /* Write it to output if it is frequent */
        if (counter>= minFrequency) {
            context.write(key,new IntWritable(counter));
        }
    }


}

1 个答案:

答案 0 :(得分:0)

获取计数器值的正确setup()reduce()实现正好是the post that you mention中显示的那个:

Counter counter = context.getCounter(CounterMapper.TestCounters.TEST);
long counterValue = counter.getValue();

其中TEST是计数器的名称,在枚举TestCounters中声明。

我没有看到您声明Cluster变量的原因......

此外,在您在评论中提到的代码中,您应该将getValue()方法的返回结果存储在变量中,作为上面的counterValue变量。

也许,你会发现this post也很有用。

更新:根据您的修改,我相信您所需要的只是MAP_INPUT_RECORDS的数量,这是默认计数器,因此您无需重新实施。

要从Driver类中获取计数器的值,您可以使用(取自this post):

job.getCounters().findCounter(COUNTER_NAME).getValue();