我正在尝试为倒排索引计算编写一个map reduce程序。
我的地图代码是
public class InvertdIdxMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable ikey, Text ivalue, Context context,Reporter reporter)
throws IOException, InterruptedException {
Text word=new Text();
Text location=new Text();
FileSplit filespilt=(FileSplit)reporter.getInputSplit();
String fileName=filespilt.getPath().getName();
location.set(fileName);
String line=ivalue.toString();
StringTokenizer itr=new StringTokenizer(line.toLowerCase());
while (itr.hasMoreTokens()){
word.set(itr.nextToken());
//System.out.println("Key is "+ word + "value is "+location);
context.write(word, location);
}
}
}
我的记录器代码是
public class InvertedIdxReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
boolean first=true;
StringBuilder toReturn=new StringBuilder();
// process valuess
Iterator<Text> itr =values.iterator();
while(itr.hasNext()){
if(!first)
toReturn.append(", ");
first=false;
toReturn.append(itr.next().toString());
}
context.write(_key,new Text(toReturn.toString()));
}
}
和驱动程序代码
public class InvertedIdxDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "JobName");
job.setJarByClass(InvertedIdxDriver.class);
// TODO: specify a mapper
job.setMapperClass(InvertdIdxMapper.class);
// TODO: specify a reducer
job.setReducerClass(InvertedIdxReducer.class);
// TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
/////
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
// TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
if (!job.waitForCompletion(true))
return;
}
}
当我运行上面的代码然后我得到以下错误
15/08/18 13:27:04 INFO mapreduce.Job: Task Id : attempt_1439870445298_0019_m_000000_2, Status : FAILED
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
此程序的输入是简单的文本文件,几行。 我跟踪了this和this帖子,但我的问题仍然存在。 我错过了一些关于map-reduce编程的重要注意事项吗?
请建议..
谢谢
答案 0 :(得分:0)
我认为您没有正确覆盖map
方法,因此调用了默认的map
方法,这就是您收到错误的原因。
检查map
方法的签名是否正确。我相信它应该是这样的:
protected void map(LongWritable iKey, Text iValue, Context context) throws IOException, InterruptedException
此外,您还需要更换此行:
FileSplit filespilt=(FileSplit)reporter.getInputSplit();
使用:
FileSplit filespilt=(FileSplit)context.getInputSplit();