我遇到了一个非常奇怪的问题。 Reducer确实有效但如果我检查输出文件,我只找到了映射器的输出。 当我尝试调试时,在将映射器的输出值类型从Longwritable更改为Text
之后,我发现了单词count sample的相同问题 package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class WordCount extends Configured implements Tool {
public static class Map
extends Mapper<LongWritable, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text wtf, Context context)
throws IOException, InterruptedException {
String line = wtf.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, new Text("frommapper"));
}
}
}
public static class Reduce
extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text wtfs,
Context context) throws IOException, InterruptedException {
/*
int sum = 0;
for (IntWritable val : wtfs) {
sum += val.get();
}
context.write(key, new IntWritable(sum));*/
context.write(key,new Text("can't output"));
}
}
public int run(String [] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(WordCount.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(Map.class);
//job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int ret = ToolRunner.run(new WordCount(), args);
System.exit(ret);
}
}
这是结果
JobClient: Combine output records=0
12/06/13 17:37:46 INFO mapred.JobClient: Map input records=7
12/06/13 17:37:46 INFO mapred.JobClient: Reduce shuffle bytes=116
12/06/13 17:37:46 INFO mapred.JobClient: Reduce output records=7
12/06/13 17:37:46 INFO mapred.JobClient: Spilled Records=14
12/06/13 17:37:46 INFO mapred.JobClient: Map output bytes=96
12/06/13 17:37:46 INFO mapred.JobClient: Combine input records=0
12/06/13 17:37:46 INFO mapred.JobClient: Map output records=7
12/06/13 17:37:46 INFO mapred.JobClient: Reduce input records=7
然后我在outfile中发现了奇怪的结果。这个问题发生在我将地图的输出值类型和输入键的reducer类型更改为Text之后,无论我是否更改了reduce输出值的类型。我也被迫改变job.setOutputValue(Text.class)
a frommapper
a frommapper
a frommapper
gg frommapper
h frommapper
sss frommapper
sss frommapper
帮助!
答案 0 :(得分:4)
您的reduce函数参数应如下所示:
public void reduce(Text key, Iterable <Text> wtfs,
Context context) throws IOException, InterruptedException {
通过定义参数的方式,reduce操作不会获取值列表,因此它只输出从map函数获取的任何输入,因为
sum+ = val.get()
每次都从0变为1,因为<key, value>
形式的每个<word, one>
对分别与减速器分开。
此外,映射器函数通常不会写入输出文件(我从未听说过它,但我不知道是否可能)。在通常情况下,始终是reducer写入输出文件。 Mapper输出是由Hadoop透明处理的中间数据。因此,如果您在输出文件中看到某些内容,那么必须是reducer输出,而不是mapper输出。如果要验证这一点,可以转到所运行作业的日志,并分别查看每个映射器和减速器中发生的情况。
希望这能为你解决一些问题。