为什么我的hadoop map reduce程序会出现类强制转换异常? 现在这给了我一个例外。 我的地图应该以键/值的形式产生输出作为Text / IntWritable。我这样做,但仍然得到IOException
public class AverageClaimsPerPatentsByCountry {
public static class MyMap extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String[] fields = value.toString().split(",");
if(fields.length >=7) {
String country = fields[4];
String claimsCount = fields[8];
System.out.println(value.toString());
int i = Integer.valueOf(claimsCount);
System.out.println(country+" --> "+i);
if(claimsCount.length() > 0) {
output.collect(new Text(country), new IntWritable(i));
}
}
}
}
public static class MyReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, DoubleWritable> {
@Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, DoubleWritable> output, Reporter reporter)
throws IOException {
int count = 0;
double claimsCount = 0;
while(values.hasNext()) {
claimsCount+=Double.valueOf(values.next().get());
count++;
}
double average = claimsCount/count;
output.collect(key, new DoubleWritable(average));
}
}
public static class MyJob extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, MyJob.class);
FileInputFormat.addInputPaths(job, "patents/patents.csv");
FileOutputFormat.setOutputPath(job, new Path("patents/output"));
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.setMapperClass(MyMap.class);
job.setReducerClass(MyReducer.class);
JobClient.runJob(job);
return 0;
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
ToolRunner.run(conf, new MyJob(), args);
}
}
Exception :-->
12/09/30 18:32:34 INFO mapred.JobClient: Running job: job_local_0001
12/09/30 18:32:34 INFO mapred.FileInputFormat: Total input paths to process : 1
12/09/30 18:32:34 INFO mapred.MapTask: numReduceTasks: 1
12/09/30 18:32:34 INFO mapred.MapTask: io.sort.mb = 100
12/09/30 18:32:35 INFO mapred.MapTask: data buffer = 79691776/99614720
12/09/30 18:32:35 INFO mapred.MapTask: record buffer = 262144/327680
4000000,1976,6206,1974,"US","NV",,1,10,106,1,12,12,17,0.3333,0.7197,0.375,8.6471,26.8333,,,,
"US" --> 10
12/09/30 18:32:35 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.DoubleWritable, recieved org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:850)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at action.eg1.AverageClaimsPerPatentsByCountry$MyMap.map(AverageClaimsPerPatentsByCountry.java:53)
at action.eg1.AverageClaimsPerPatentsByCountry$MyMap.map(AverageClaimsPerPatentsByCountry.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/09/30 18:32:35 INFO mapred.JobClient: map 0% reduce 0%
12/09/30 18:32:35 INFO mapred.JobClient: Job complete: job_local_0001
12/09/30 18:32:35 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
答案 0 :(得分:1)
如果没有为mapper指定输出类,它将默认为setOutputClass中给出的类,即MyReducer。
你需要这个:
setMapOutputClass(IntWritable.class)
答案 1 :(得分:0)
引自https://developer.yahoo.com/hadoop/tutorial/module4.html:
reducer发出的数据类型由setOutputKeyClass()和setOutputValueClass()标识。默认情况下,假设这些也是映射器的输出类型。如果不是这种情况,JobConf类的方法setMapOutputKeyClass()和setMapOutputValueClass()方法将覆盖这些方法。
因此,setOutputKeyClass()和setOutputValueClass()定义mapper和reducer的输出类型。如果映射器应具有不同的输出类型,请使用setMapOutputKeyClass()和setMapOutputValueClass()。
在当前的Haddop版本(2.5.1以及之前的某些版本)中,建议使用Job类而不是JobConf:
Job job = Job.getInstance(new Configuration());
job.setMapOutputKeyClass(YourOutputKeyClass1.class);
job.setMapOutputValueClass(YourOutputValueClass1.class);
job.setOutputKeyClass(YourOutputKeyClass2.class);
job.setOutputValueClass(YourOutputValueClass2.class);
从引用(和我的经验)得出结论如果你有一个mapper-only作业(没有reducer),setOutputKeyClass()与setMapOutputKeyClass()具有相同的效果(setOutputValueClass()和setMapOutputValueClass()相同)。