友
我是Map-Reduce的新手,并尝试了一个只执行Mapper的示例;但输出很奇怪,没有预料到。如果我在这里遗漏了什么,请帮我找一下:
代码部分:
进口:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
驱动程序
Job job = new Job(conf,"SampleProgram");
job.setJarByClass(SampleMR.class); // class that contains mapper and reducer
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class); // reducer class
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path("/tmp/"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/out")); // adjust directories as required
job.submit();
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
Mapper计划
public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable idx , Text value, Context context) throws IOException, InterruptedException {
String[] tokens = value.toString().split("|");
String keyPrefix = tokens[0] + tokens[1];
context.write(new Text(keyPrefix), value);
}
}
还有一个减速器阶段,但我已将reducer设置为0来调试问题。这里的映射器行为不正确。
输入
379782759851005 | ABCDEFG |名:YOLO |顶部:44.7 | avgtop:19.2
预期的地图输出
379782759851005ABCDEFG [空格] 379782759851005 | ABCDEFG |姓名:YOLO | top:44.7 | avgtop:19.2
输出我的Mapper
3 [空白] 379782759851005 | ABCDEFG |姓名:YOLO | top:44.7 | avgtop:19.2
看起来,Key只打印预期输出的第一个字母。如果我尝试将tokens[4]
作为值添加到上下文中,那么值也会发生同样的情况。看起来在分割字符串时会发生一些事情。
任何洞察力,可能出现什么问题?