Map-Reduce程序:Mapper的行为不符合预期

时间:2015-08-07 06:12:41

标签: java hadoop mapreduce

我是Map-Reduce的新手,并尝试了一个只执行Mapper的示例;但输出很奇怪,没有预料到。如果我在这里遗漏了什么,请帮我找一下:

代码部分:

进口:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

驱动程序

Job job = new Job(conf,"SampleProgram");
job.setJarByClass(SampleMR.class);     // class that contains mapper and reducer
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);    // reducer class

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path("/tmp/"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/out"));  // adjust directories as required

job.submit();

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

Mapper计划

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>  {
@Override
        public void map(LongWritable idx , Text value, Context context) throws IOException, InterruptedException {
            String[] tokens = value.toString().split("|");
            String keyPrefix = tokens[0] + tokens[1];
            context.write(new Text(keyPrefix), value);
        }
    }

还有一个减速器阶段,但我已将reducer设置为0来调试问题。这里的映射器行为不正确。

输入

  

379782759851005 | ABCDEFG |名:YOLO |顶部:44.7 | avgtop:19.2

预期的地图输出

  

379782759851005ABCDEFG [空格] 379782759851005 | ABCDEFG |姓名:YOLO | top:44.7 | avgtop:19.2

输出我的Mapper

  

3 [空白] 379782759851005 | ABCDEFG |姓名:YOLO | top:44.7 | avgtop:19.2

看起来,Key只打印预期输出的第一个字母。如果我尝试将tokens[4]作为值添加到上下文中,那么值也会发生同样的情况。看起来在分割字符串时会发生一些事情。 任何洞察力,可能出现什么问题?

1 个答案:

答案 0 :(得分:1)

你需要逃避管道角色。请参阅以下链接:

Splitting string with pipe character ("|")