Map减少作业生成空输出文件

时间:2014-11-04 03:49:57

标签: java apache hadoop mapreduce bigdata

程序正在生成空输出文件。任何人都可以建议我哪里出错了。 任何帮助将受到高度赞赏。我试着把job.setNumReduceTask(0)放在我没有使用reducer但仍然是输出文件是空的。

public static class PrizeDisMapper extends Mapper<LongWritable, Text, Text, Pair>{
int rating = 0;
Text CustID;
IntWritable r;
Text MovieID;
public void map(LongWritable key, Text line, Context context
                ) throws IOException, InterruptedException {
        String line1 = line.toString();
        String [] fields = line1.split(":");
        if(fields.length > 1)
             {
             String Movieid = fields[0];
             String line2 = fields[1];
             String [] splitline = line2.split(",");
             String Custid = splitline[0];
             int rate = Integer.parseInt(splitline[1]);
             r = new IntWritable(rate);
             CustID = new Text(Custid);
             MovieID = new Text(Movieid);
             Pair P = new Pair();
             context.write(MovieID,P);
             }
             else
             {
             return;
             }
  }
}

 public static class IntSumReducer extends Reducer<Text,Pair,Text,Pair> {
 private IntWritable result = new IntWritable();
 public void reduce(Text key, Iterable<Pair> values,
                   Context context
                   ) throws IOException, InterruptedException {
  for (Pair val : values) {
    context.write(key, val);
  }
  }

  public class Pair implements Writable
  {
  String key;
  int value;
    public void write(DataOutput out) throws IOException {
     out.writeInt(value);
     out.writeChars(key);
  }
  public void readFields(DataInput in) throws IOException {
     key = in.readUTF();
     value = in.readInt();
  }
  public void setVal(String aKey, int aValue)
  {
     key   = aKey;
      value = aValue;
  }

主要课程:

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
  System.err.println("Usage: wordcount <in> <out>");
  System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setInputFormatClass (TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Pair.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);

感谢@Pathmanaban Palsamy和@Chris Gerken的建议。我已根据您的建议修改了代码,但仍然获得空输出文件。任何人都可以建议我在我的主类中输入和输出配置。我是否需要在mapper&amp ;;的输入中指定Pair类。如何?

2 个答案:

答案 0 :(得分:3)

我猜测应该将reduce方法声明为

public void reduce(Text key, Iterable<Pair> values,
               Context context
               ) throws IOException, InterruptedException

您将传递一个Iterable(一个可以从中获取迭代器的对象),您可以使用该迭代迭代映射到给定键的所有值。

答案 1 :(得分:1)

由于不需要减速机,我怀疑在线下

Pair P = new Pair(); context.write(MovieID,P);

空对将是问题。 还请检查你的Driver类,你已经给出了正确的keyclass和valueclass,如

job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Pair.class);