Map类不能按预期工作

时间:2015-10-11 16:12:17

标签: hadoop mapreduce

我有这个地图类[1],其目标只是以相同的格式读取和写入内容。输入数据是[2],它将由地图类处理。我希望这个map类不会对数据进行任何转换,只是输出输入数据。不幸的是,我收到了这个错误[3],我不明白地图类错在哪里。有没有帮助修复地图类?

[1]我的地图类(现在已更正)。

    /** Identity mapper set by the user. */
public static class MyFullyIndentityMapper
        extends Mapper<LongWritable, Text, Text, IntWritable>{

    private Text word = new Text();
    private final static IntWritable one = new IntWritable(1);

    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, new IntWritable(Integer.valueOf(itr.nextToken())));
        }
    }

    public void run(Context context) throws IOException, InterruptedException {
        setup(context);
        try {
            while (context.nextKeyValue()) {
                System.out.println("Key ( " + context.getCurrentKey().getClass().getName() + " ): " + context.getCurrentKey()
                        + " Value (" + context.getCurrentValue().getClass().getName() + "): " + context.getCurrentValue());
                map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        } finally {
            cleanup(context);
        }
    }
}

[2]输入数据

B   1
C   1
I   1
O   1
C   1
E   1
B   1
B   1
B   1
B   1

[3]我在执行地图类时遇到的错误。

Key ( org.apache.hadoop.io.LongWritable ): 0 Value (org.apache.hadoop.io.Text): B
2015-10-11 11:59:54,680 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.util.NoSuchElementException
  at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
  at org.apache.hadoop.mapred.examples.MyWordCount$MyFullyIndentityMapper.map(MyWordCount.java:93)
  at org.apache.hadoop.mapred.examples.MyWordCount$MyFullyIndentityMapper.run(MyWordCount.java:104)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

[4]我的减少课程

public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, Context context
    ) throws IOException, InterruptedException {
        int sum = 0;
        Iterator iter = values.iterator();
        while(iter.hasNext()){
            System.out.println(iter.next());
        }
        for (IntWritable val : values) {
            System.out.println(" - key ( " + key.getClass().toString() + "): " + key.toString()
                    + " value ( " + val.getClass().toString() + " ): " + val.toString());
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }

    public void run(Context context) throws IOException, InterruptedException {
        System.out.println("Output dir: " + context.getConfiguration().get("mapred.output.dir"));
        System.out.println("Partitioner class: " + context.getConfiguration().get("mapreduce.partitioner.class"));
        try {
            while (context.nextKey()) {
                System.out.println("Key: " + context.getCurrentKey());
                reduce(context.getCurrentKey(), context.getValues(), context);
            }
        } finally {
            cleanup(context);
        }
    }
}

[5]主要课程

public static void main(String[] args) throws Exception {
    GenericOptionsParser parser = new GenericOptionsParser(new Configuration(), args);

    String[] otherArgs = parser.getRemainingArgs();
    if (otherArgs.length < 2) {
        System.err.println("Usage: wordcount [<in>...] <out>");
        System.exit(2);
    }

    // first map tasks
    JobConf conf = new JobConf(MyWordCount.class);
    conf.setJobName("wordcount");

    conf.setJarByClass(MyWordCount.class);
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);
    conf.setNumReduceTasks(1);

    Path[] inputPaths = new Path[otherArgs.length-1];
    for (int i = 0; i < otherArgs.length - 1; ++i) { inputPaths[i] = new Path(otherArgs[i]); }
    Path outputPath =  new Path(otherArgs[otherArgs.length - 1]);
    FileInputFormat.setInputPaths(conf, inputPaths);
    FileOutputFormat.setOutputPath(conf, outputPath);

    // launch the job directly
    Job job = new Job(conf, conf.getJobName());
    job.setJarByClass(MyWordCount.class);
    job.setMapperClass(MyFullyIndentityMapper.class);
    job.setReducerClass(MyReducer.class);
    job.setPartitionerClass(HashPartitioner.class);

    job.waitForCompletion(true);

    System.exit(0);
}

[6]以下是我使用的导入,以防有必要

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.log4j.Logger;
import java.io.IOException;
import java.util.StringTokenizer;

2 个答案:

答案 0 :(得分:1)

请再次检查您的输入文件。

Key ( org.apache.hadoop.io.LongWritable ): 0 Value (org.apache.hadoop.io.Text): B

从上面一行可以理解,上下文只是将您的值提取为B而不是B 1。 因此,当我们尝试从字符串标记生成器获取下一个标记并将其设置为val部分时,它会抛出错误。

答案 1 :(得分:0)

发现问题。签名必须是public void reduce(Text key,Iterable values,Context context)抛出IOException,InterruptedException {...}