从Hadoop MapReduce中的reducer访问mapper的计数器

时间:2015-10-01 18:23:54

标签: java hadoop intellij-idea mapreduce word-count

我需要从reducer中的mapper访问计数器。我试图执行this solution。我的WordCount代码可在下面找到。

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Cluster;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordCount {

    static enum TestCounters { TEST }
    public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
                context.getCounter(TestCounters.TEST).increment(1);
            }
        }
    }

    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        private long mapperCounter;

        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }

        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            Configuration conf = context.getConfiguration();
            Cluster cluster = new Cluster(conf);
            Job currentJob = cluster.getJob(context.getJobID());
            mapperCounter = currentJob.getCounters().findCounter(TestCounters.TEST).getValue();;
            System.out.println(mapperCounter);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = new Job(conf, "WordCount");
        job.setJarByClass(WordCount.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    }

}

我在IntelliJ上运行此代码,具有以下依赖关系。

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.1</version>
        </dependency>

但我遇到NoSuchFieldError:SEPERATOR并没有解决它。运行Cluster cluster = new Cluster(conf)时发生错误;线。

15/10/01 19:55:29 WARN mapred.LocalJobRunner: job_local482979212_0001
java.lang.NoSuchFieldError: SEPERATOR
    at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
    at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
    at org.apache.hadoop.mapreduce.Cluster.<clinit>(Cluster.java:71)
    at WordCount$Reduce.setup(WordCount.java:51)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
15/10/01 19:55:30 INFO mapred.JobClient:  map 100% reduce 0%
15/10/01 19:55:30 INFO mapred.JobClient: Job complete: job_local482979212_0001
15/10/01 19:55:30 INFO mapred.JobClient: Counters: 20
15/10/01 19:55:30 INFO mapred.JobClient:   Map-Reduce Framework
15/10/01 19:55:30 INFO mapred.JobClient:     Spilled Records=16
15/10/01 19:55:30 INFO mapred.JobClient:     Map output materialized bytes=410
15/10/01 19:55:30 INFO mapred.JobClient:     Reduce input records=0
15/10/01 19:55:30 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
15/10/01 19:55:30 INFO mapred.JobClient:     Map input records=8
15/10/01 19:55:30 INFO mapred.JobClient:     SPLIT_RAW_BYTES=103
15/10/01 19:55:30 INFO mapred.JobClient:     Map output bytes=372
15/10/01 19:55:30 INFO mapred.JobClient:     Reduce shuffle bytes=0
15/10/01 19:55:30 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
15/10/01 19:55:30 INFO mapred.JobClient:     Reduce input groups=0
15/10/01 19:55:30 INFO mapred.JobClient:     Combine output records=0
15/10/01 19:55:30 INFO mapred.JobClient:     Reduce output records=0
15/10/01 19:55:30 INFO mapred.JobClient:     Map output records=16
15/10/01 19:55:30 INFO mapred.JobClient:     Combine input records=0
15/10/01 19:55:30 INFO mapred.JobClient:     CPU time spent (ms)=0
15/10/01 19:55:30 INFO mapred.JobClient:     Total committed heap usage (bytes)=160432128
15/10/01 19:55:30 INFO mapred.JobClient:   WordCount$TestCounters
15/10/01 19:55:30 INFO mapred.JobClient:     TEST=16
15/10/01 19:55:30 INFO mapred.JobClient:   File Input Format Counters 
15/10/01 19:55:30 INFO mapred.JobClient:     Bytes Read=313
15/10/01 19:55:30 INFO mapred.JobClient:   FileSystemCounters
15/10/01 19:55:30 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=51594
15/10/01 19:55:30 INFO mapred.JobClient:     FILE_BYTES_READ=472

之后,我构建了jar文件并在单个节点 2.6.0 hadoop 上运行。在这里,我收到了以下错误。

15/10/01 20:58:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/01 20:58:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/01 20:58:17 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/10/01 20:58:31 INFO input.FileInputFormat: Total input paths to process : 1
15/10/01 20:58:33 INFO mapreduce.JobSubmitter: number of splits:1
15/10/01 20:58:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443718874432_0002
15/10/01 20:58:36 INFO impl.YarnClientImpl: Submitted application application_1443718874432_0002
15/10/01 20:58:36 INFO mapreduce.Job: The url to track the job: http://tolga-Aspire-5741G:8088/proxy/application_1443718874432_0002/
15/10/01 20:58:36 INFO mapreduce.Job: Running job: job_1443718874432_0002
15/10/01 20:59:22 INFO mapreduce.Job: Job job_1443718874432_0002 running in uber mode : false
15/10/01 20:59:22 INFO mapreduce.Job:  map 0% reduce 0%
15/10/01 21:00:17 INFO mapreduce.Job:  map 100% reduce 0%
15/10/01 21:00:20 INFO mapreduce.Job:  map 0% reduce 0%
15/10/01 21:00:20 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
15/10/01 21:00:47 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

15/10/01 21:00:55 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
15/10/01 21:01:03 INFO mapreduce.Job:  map 100% reduce 100%
15/10/01 21:01:07 INFO mapreduce.Job: Job job_1443718874432_0002 failed with state FAILED due to: Task failed task_1443718874432_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/10/01 21:01:08 INFO mapreduce.Job: Counters: 12
    Job Counters 
        Failed map tasks=4
        Launched map tasks=4
        Other local map tasks=3
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=78203
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=78203
        Total vcore-seconds taken by all map tasks=78203
        Total megabyte-seconds taken by all map tasks=80079872
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0

如何解决这个问题?

感谢您的关注。

注释:IntelliJ和单节点Hadoop集群中使用的输入文件不同

0 个答案:

没有答案