我的输出中有很多重复的值,所以我已经实现了一个reduce函数,如下所示,但是这个reduce仍然作为一个标识函数,即使我有一个reduce也没有输出。我的reduce函数有什么问题?
public class search
{
public static String str="And";
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
{
String mname="";
public void configure(JobConf job)
{
mname=job.get(str);
job.set(mname,str);
}
private Text word = new Text();
public Text Uinput =new Text("");
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
String mapstr=mname;
Uinput.set(mapstr);
String line = value.toString();
Text fdata = new Text();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
fdata.set(line);
if(word.equals(Uinput))
output.collect(fdata,new Text(""));
}
}
}
public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
boolean start = true;
//System.out.println("inside reduce :"+input);
StringBuilder sb = new StringBuilder();
while (values.hasNext())
{
if(!start)
start=false;
sb.append(values.next().toString());
}
//output.collect(key, new IntWritable(sum));
output.collect(key, new Text(sb.toString()));
}
}
public static void main(String [] args)抛出异常 {
JobConf conf = new JobConf(search.class);
conf.setJobName("QueryIndex");
//JobConf conf = new JobConf(getConf(), WordCount.class);
conf.set(str,args[0]);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
//conf.setCombinerClass(SReducer.class);
conf.setReducerClass(SReducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("IIndexOut"));
FileOutputFormat.setOutputPath(conf, new Path("searchOut"));
JobClient.runJob(conf);
}
}
答案 0 :(得分:1)
我没有仔细查看过代码,但我确定的一件事是布尔变量 start 在这里没用,下面的代码是 if(!start)< / em> 应该放在括号中以删除数据,否则你最终只能写入从mapper收到的reducer中的所有数据。
public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
boolean start = true;
//System.out.println("inside reduce :"+input);
StringBuilder sb = new StringBuilder();
while (values.hasNext())
{
if(!start)
{
start=false;
sb.append(values.next().toString());
}
}
//output.collect(key, new IntWritable(sum));
output.collect(key, new Text(sb.toString()));
}
}
或者最佳的reduce方法是: -
public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
//System.out.println("inside reduce :"+input);
StringBuilder sb = new StringBuilder();
sb.append(values.next().toString());
//output.collect(key, new IntWritable(sum));
output.collect(key, new Text(sb.toString()));
}
}
因为你只关心迭代器的第一个值。
答案 1 :(得分:0)
也许您没有将此减速器设置为要使用的实际减少功能?这是使用
完成的job.setReducerClass().
如果未将类设置为类,则使用默认的reducer。您应该执行以下操作:
job.setReducerClass(SReducer.class)
请发布您的主要功能,以便我们验证。
答案 2 :(得分:0)
在map和reduce函数之前使用@override注释。因此,您可以非常肯定,您正在覆盖基类方法。