我试图在java中编写mapreduce代码。这里是我的文件。
mapper类(bmapper):
awk '
NR==FNR { words2matches[$0]; next }
{
for (word in words2matches) {
if ( index(tolower($0),tolower(word)) ) {
words2matches[word] = words2matches[word] $0 ORS
}
}
}
END {
for (word in words2matches) {
print word ":" ORS words2matches[word]
}
}
' list.txt *.in > results.txt
reducer class(breducer):
public class bmapper extends Mapper<LongWritable,Text,Text,NullWritable>{
private String txt=new String();
public void mapper(LongWritable key,Text value,Context context)
throws IOException, InterruptedException{
String str =value.toString();
int index1 = str.indexOf("TABLE OF CONTENTS");
int index2 = str.indexOf("</table>");
int index3 = str.indexOf("MANAGEMENT'S DISCUSSION AND ANALYSIS");
if(index1 == -1)
{ txt ="nil";
}
else
{
if(index1<index3 && index2>index3)
{
int index4 = index3+ 109;
int pageno =str.charAt(index4);
String[] pages =str.split("<page>");
txt = pages[pageno+1];
}
else
{
txt ="nil";
}
}
context.write(new Text(txt), NullWritable.get());
}
}
驱动程序类(bdriver):
public class breducer extends Reducer<Text,NullWritable,Text,NullWritable>{
public void reducer(Text key,NullWritable value,Context context) throws IOException,InterruptedException{
context.write(key, value);
}
}
`
我收到了以下错误。
public class bdriver {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJobName("black coffer");
job.setJarByClass(bdriver.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
job.setReducerClass(breducer.class);
job.setMapperClass(bmapper.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path[]{new Path(args[0])});
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
我认为它无法找到Mapper和reducer类。我已经在主类中编写了代码,它正在获取默认的Mapper和reducer类
答案 0 :(得分:0)
您的输入/输出类型似乎与作业配置兼容。
在此处添加问题详细信息和解决方案(根据评论中的讨论,OP已确认问题已解决)。
根据Javadoc,reducer的reduce方法低于签名
protected void reduce(KEYIN key,
Iterable<VALUEIN> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException,
InterruptedException
根据它,减速器应该是
public class breducer extends Reducer<Text,NullWritable,Text,NullWritable>{
@Overwrite
public void reducer(Text key,Iterable<NullWritable> value,Context context) throws IOException,InterruptedException{
// Your logic
}
}
问题在于,由于map()
和reduce()
方法的签名略有不同,这些方法实际上并未获得overriden
。它只是overloading
相同的方法名称。
在@Override
和map()
功能上添加reduce()
注释后,问题就出现了。虽然这不是强制性的,但作为最佳做法,请始终在已实施的@Override
和map()
方法上添加reduce()
注释。