Question

我试图在java中编写mapreduce代码。这里是我的文件。

mapper类（bmapper）：

awk '
NR==FNR { words2matches[$0]; next }
{
    for (word in words2matches) {
        if ( index(tolower($0),tolower(word)) ) {
            words2matches[word] = words2matches[word] $0 ORS
        }
    }
}
END {
    for (word in words2matches) {
        print word ":" ORS words2matches[word]
    }
}
' list.txt *.in > results.txt

reducer class（breducer）：

public class bmapper extends Mapper<LongWritable,Text,Text,NullWritable>{
    private String txt=new String();
    public void mapper(LongWritable key,Text value,Context context) 
        throws IOException, InterruptedException{

        String str =value.toString(); 
        int index1 = str.indexOf("TABLE OF CONTENTS");
        int index2 = str.indexOf("</table>");
        int index3 = str.indexOf("MANAGEMENT'S DISCUSSION AND ANALYSIS");

        if(index1 == -1)    
        {   txt ="nil";
        }
        else
        {
           if(index1<index3 && index2>index3)
           {
               int index4 = index3+ 109;
              int pageno =str.charAt(index4);
              String[] pages =str.split("<page>");
             txt = pages[pageno+1];
           }
           else
           {
               txt ="nil";

           }
        }

        context.write(new Text(txt), NullWritable.get());
    } 

}

驱动程序类（bdriver）：

public class breducer extends Reducer<Text,NullWritable,Text,NullWritable>{

    public void reducer(Text key,NullWritable value,Context context) throws IOException,InterruptedException{

        context.write(key, value);

    }

}

`

我收到了以下错误。

public class bdriver {

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        Job job = new Job(conf);
        job.setJobName("black coffer");
        job.setJarByClass(bdriver.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);
        job.setReducerClass(breducer.class);
        job.setMapperClass(bmapper.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.setInputPaths(job, new Path[]{new Path(args[0])});
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
  }
}

我认为它无法找到Mapper和reducer类。我已经在主类中编写了代码，它正在获取默认的Mapper和reducer类

Answer 1

您的输入/输出类型似乎与作业配置兼容。

在此处添加问题详细信息和解决方案（根据评论中的讨论，OP已确认问题已解决）。

根据Javadoc，reducer的reduce方法低于签名

protected void reduce(KEYIN key,
          Iterable<VALUEIN> values,
          org.apache.hadoop.mapreduce.Reducer.Context context)
               throws IOException,
                      InterruptedException

根据它，减速器应该是

public class breducer extends Reducer<Text,NullWritable,Text,NullWritable>{
    @Overwrite
    public void reducer(Text key,Iterable<NullWritable> value,Context context) throws IOException,InterruptedException{
        // Your logic
    }
}

问题在于，由于map()和reduce()方法的签名略有不同，这些方法实际上并未获得overriden。它只是overloading相同的方法名称。

在@Override和map()功能上添加reduce()注释后，问题就出现了。虽然这不是强制性的，但作为最佳做法，请始终在已实施的@Override和map()方法上添加reduce()注释。

错误 - 来自地图的密钥中的类型不匹配：期望org.apache.hadoop.io.Text，收到org.apache.hadoop.io.LongWritable

1 个答案: