mapreduce程序出现两个问题
java.io.IOException: wrong value class: class org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.Text
java.lang.ArrayIndexOutOfBoundsException: 4
我已经设置了地图输出键和值类,如在其他文章中所发现的那样,但仍然不能解决这两个问题。对于第二个问题,我专门测试了导致问题的map中的代码集,并且在简单的文件读取程序中是正确的。
作为参考,这是问题1的完整输出
Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:194)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1350)
at peoplemail.DomainGenderCount$ReduceClass.reduce(DomainGenderCount.java:52)
at peoplemail.DomainGenderCount$ReduceClass.reduce(DomainGenderCount.java:1)
at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1615)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1637)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:460)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
这是问题2的完整输出
Error: java.lang.ArrayIndexOutOfBoundsException: 4
at peoplemail.DomainGenderCount$MapClass.map(DomainGenderCount.java:34)
at peoplemail.DomainGenderCount$MapClass.map(DomainGenderCount.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
数据 这是我要处理的几行csv文件
18,Daveen,Cupitt,dcupitth@last.fm,6288608483,Female
19,Marney,Eskell,meskelli@nifty.com,8164369834,Female
20,Teri,Yitzhak,tyitzhakj@bloglovin.com,2548784310,Female
21,Alain,Niblo,aniblok@howstuffworks.com,5195420924,Male
22,Vin,Creevy,vcreevyl@sfgate.com,8574528831,Female
23,Ermina,Pena,epenam@mediafire.com,2236545787,Female
24,Chrisy,Chue,cchuen@google.com,9455751444,Male
25,Morgen,Izakof,mizakofo@noaa.gov,8031181365,Male
MapClass
public static class MapClass
extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>{
@Override
public void map(LongWritable key,Text value,
OutputCollector<Text,Text> output, Reporter r)throws IOException{
String fields[] = value.toString().split(",");
String gender = fields[5];
String domain = fields[3].split("@")[1];
output.collect(new Text(domain), new Text(gender));
}
}
ReduceClass
public static class ReduceClass
extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable>{
@Override
public void reduce(Text key, Iterator<Text> value,
OutputCollector<Text,IntWritable> output, Reporter r)throws IOException{
int count=0;
while(value.hasNext()){
value.next();
count++;
}
output.collect(key, new IntWritable(count));
}
}
运行方法
public int run(String[] paths) throws Exception {
JobConf jobConf = new JobConf(getConf(), DomainGenderCount.class);
jobConf.setMapOutputKeyClass(Text.class);
jobConf.setMapOutputValueClass(Text.class);
jobConf.setJobName("Number of Users in each domain:");
jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(IntWritable.class);
jobConf.setMapperClass(MapClass.class);
jobConf.setReducerClass(ReduceClass.class);
jobConf.setCombinerClass(ReduceClass.class);
FileInputFormat.setInputPaths(jobConf, new Path(paths[0]));
FileOutputFormat.setOutputPath(jobConf, new Path(paths[1]));
JobClient.runJob(jobConf);
return 0;
}
这是我给hadoop的电话
hadoop jar C:\Users\suman\Desktop\domaingendercount.jar /Data/people.csv /Data/Output/
我用这个小程序测试的输入文件
package peoplemail;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class Test {
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\suman\\Desktop\\people.csv");
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
String line;
while (null != (line=bufferedReader.readLine())) {
String fields[] = line.split(",");
String gender = fields[5];
String domain = fields[3].split("@")[1];
System.out.println(domain + " " + gender);
}
bufferedReader.close();
}
}
此代码运行正确。
这些文件包含hadoop的所有代码,数据和输出。
答案 0 :(得分:3)
您的数组fields []将有5个元素,索引从0开始,并且由于字段的长度为5,所以fields [5]给出了“ ArrayIndexOutOfBoundsException”。
这是校正后的映射器,
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>{
@Override
public void map(LongWritable key,Text value,
OutputCollector<Text,Text> output, Reporter r)throws IOException{
String fields[] = value.toString().split(",");
String domain = fields[3].split("@")[1];
String gender = fields[5];
output.collect(new Text(domain), new Text(gender));
}
}