我有这个地图类[1],其目标只是以相同的格式读取和写入内容。输入数据是[2],它将由地图类处理。我希望这个map类不会对数据进行任何转换,只是输出输入数据。不幸的是,我收到了这个错误[3],我不明白地图类错在哪里。有没有帮助修复地图类?
[1]我的地图类(现在已更正)。
/** Identity mapper set by the user. */
public static class MyFullyIndentityMapper
extends Mapper<LongWritable, Text, Text, IntWritable>{
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, new IntWritable(Integer.valueOf(itr.nextToken())));
}
}
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKeyValue()) {
System.out.println("Key ( " + context.getCurrentKey().getClass().getName() + " ): " + context.getCurrentKey()
+ " Value (" + context.getCurrentValue().getClass().getName() + "): " + context.getCurrentValue());
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
}
[2]输入数据
B 1
C 1
I 1
O 1
C 1
E 1
B 1
B 1
B 1
B 1
[3]我在执行地图类时遇到的错误。
Key ( org.apache.hadoop.io.LongWritable ): 0 Value (org.apache.hadoop.io.Text): B
2015-10-11 11:59:54,680 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at org.apache.hadoop.mapred.examples.MyWordCount$MyFullyIndentityMapper.map(MyWordCount.java:93)
at org.apache.hadoop.mapred.examples.MyWordCount$MyFullyIndentityMapper.run(MyWordCount.java:104)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
[4]我的减少课程
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException {
int sum = 0;
Iterator iter = values.iterator();
while(iter.hasNext()){
System.out.println(iter.next());
}
for (IntWritable val : values) {
System.out.println(" - key ( " + key.getClass().toString() + "): " + key.toString()
+ " value ( " + val.getClass().toString() + " ): " + val.toString());
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
public void run(Context context) throws IOException, InterruptedException {
System.out.println("Output dir: " + context.getConfiguration().get("mapred.output.dir"));
System.out.println("Partitioner class: " + context.getConfiguration().get("mapreduce.partitioner.class"));
try {
while (context.nextKey()) {
System.out.println("Key: " + context.getCurrentKey());
reduce(context.getCurrentKey(), context.getValues(), context);
}
} finally {
cleanup(context);
}
}
}
[5]主要课程
public static void main(String[] args) throws Exception {
GenericOptionsParser parser = new GenericOptionsParser(new Configuration(), args);
String[] otherArgs = parser.getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount [<in>...] <out>");
System.exit(2);
}
// first map tasks
JobConf conf = new JobConf(MyWordCount.class);
conf.setJobName("wordcount");
conf.setJarByClass(MyWordCount.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setNumReduceTasks(1);
Path[] inputPaths = new Path[otherArgs.length-1];
for (int i = 0; i < otherArgs.length - 1; ++i) { inputPaths[i] = new Path(otherArgs[i]); }
Path outputPath = new Path(otherArgs[otherArgs.length - 1]);
FileInputFormat.setInputPaths(conf, inputPaths);
FileOutputFormat.setOutputPath(conf, outputPath);
// launch the job directly
Job job = new Job(conf, conf.getJobName());
job.setJarByClass(MyWordCount.class);
job.setMapperClass(MyFullyIndentityMapper.class);
job.setReducerClass(MyReducer.class);
job.setPartitionerClass(HashPartitioner.class);
job.waitForCompletion(true);
System.exit(0);
}
[6]以下是我使用的导入,以防有必要
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.log4j.Logger;
import java.io.IOException;
import java.util.StringTokenizer;
答案 0 :(得分:1)
请再次检查您的输入文件。
Key ( org.apache.hadoop.io.LongWritable ): 0 Value (org.apache.hadoop.io.Text): B
从上面一行可以理解,上下文只是将您的值提取为B
而不是B 1
。
因此,当我们尝试从字符串标记生成器获取下一个标记并将其设置为val部分时,它会抛出错误。
答案 1 :(得分:0)
发现问题。签名必须是public void reduce(Text key,Iterable values,Context context)抛出IOException,InterruptedException {...}