Question

我的输出中有很多重复的值，所以我已经实现了一个reduce函数，如下所示，但是这个reduce仍然作为一个标识函数，即使我有一个reduce也没有输出。我的reduce函数有什么问题？

       public class search 
{      
    public static String str="And";
    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> 
    {
        String mname="";
        public void configure(JobConf job)
        {
             mname=job.get(str);
             job.set(mname,str);
        }

        private Text word = new Text();
        public Text Uinput =new Text("");
        public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
        {

            String mapstr=mname;
            Uinput.set(mapstr);
            String line = value.toString();
            Text fdata = new Text();

            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens())
            {
                word.set(tokenizer.nextToken());
                fdata.set(line);

                if(word.equals(Uinput))
                output.collect(fdata,new Text(""));
            }

        }
    } 

    public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
    {
        public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
        {

            boolean start = true;
            //System.out.println("inside reduce   :"+input);
            StringBuilder sb = new StringBuilder();
            while (values.hasNext()) 
            {
                if(!start)

                start=false;
                sb.append(values.next().toString());

            }
            //output.collect(key, new IntWritable(sum));
            output.collect(key, new Text(sb.toString()));
        }
    }

public static void main（String [] args）抛出异常 {

    JobConf conf = new JobConf(search.class);
    conf.setJobName("QueryIndex");
    //JobConf conf = new JobConf(getConf(), WordCount.class);
    conf.set(str,args[0]);

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(Text.class);

    conf.setMapperClass(Map.class);
    //conf.setCombinerClass(SReducer.class);
    conf.setReducerClass(SReducer.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);



    FileInputFormat.setInputPaths(conf, new Path("IIndexOut"));
    FileOutputFormat.setOutputPath(conf, new Path("searchOut"));

    JobClient.runJob(conf);
}

}

Answer 1

我没有仔细查看过代码，但我确定的一件事是布尔变量 start 在这里没用，下面的代码是 if（！start）< / em> 应该放在括号中以删除数据，否则你最终只能写入从mapper收到的reducer中的所有数据。

public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { boolean start = true; //System.out.println("inside reduce :"+input); StringBuilder sb = new StringBuilder(); while (values.hasNext()) { if(!start) { start=false; sb.append(values.next().toString()); } } //output.collect(key, new IntWritable(sum)); output.collect(key, new Text(sb.toString())); } }

或者最佳的reduce方法是： -

public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { //System.out.println("inside reduce :"+input); StringBuilder sb = new StringBuilder(); sb.append(values.next().toString()); //output.collect(key, new IntWritable(sum)); output.collect(key, new Text(sb.toString())); }

}

因为你只关心迭代器的第一个值。

Answer 2

也许您没有将此减速器设置为要使用的实际减少功能？这是使用

完成的

job.setReducerClass().

如果未将类设置为类，则使用默认的reducer。您应该执行以下操作：

job.setReducerClass(SReducer.class)

请发布您的主要功能，以便我们验证。

Answer 3

在map和reduce函数之前使用@override注释。因此，您可以非常肯定，您正在覆盖基类方法。

mapreduce程序输出重复？

3 个答案: