Question

我是Hadoop的新手。我试图修改WordCount示例以执行以下任务（将键设置为第二个元素，以及该键的对应值的第四个和第五个元素，然后根据键值将它们组合在一起并写入最终结果成文本文件）：

Input.txt :
a:b:c:d:e:f
g:h:i:j:k:l
m:b:n:o:p:q

Output.txt :
b:d:o:e:p
h:j:k

这是我的代码：

public class Test {

    public static class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, Text> {

        private Text word = new Text();

        public void map(LongWritable key, Text value,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {
                String [] temp = value.toString().split(":");
                String remainder = temp[3] + ":" +temp[4];
                output.collect(new Text(temp[1]), new Text(remainder));
            }
        }


    public static class Reduce extends MapReduceBase implements
            Reducer<Text, Text, Text, Text> {
        public void reduce(Text key, Iterator<Text> values,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {

            String temp ="";
            while (values.hasNext()) {
                temp = temp + values.next().toString();     

            }

            //String remainder = ":" +temp;
            output.collect(key,new Text(temp));// point
            // :
            // distance
        }
    }

    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(Test.class);
        conf.setJobName("pivotpoints");

        System.out.println(conf.getNumMapTasks() + "map runs");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(Text.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf); 
    }   
}

这是我从上面的代码得到的输出：

part-00000 :
b d:eo:p
h j:k

所以，我的问题是：如何让Hadoop使用特殊的分隔符号将最终输出写入给定格式的文本文件中？

Answer 1

在主类的配置中设置属性mapreduce.output.textoutputformat.separator。

conf.set("mapreduce.output.textoutputformat.separator",":");

映射以发出以下序列

key value 
b d
b e
h j
h k
b o
b p

Reducer会自动将其分组为

b [d, e, o, p]
h [j, k]

您可以遍历reducer中每个键的值列表，并将您的：放在值之间，将它们连接到一个字符串。

然后

Reducer可以发出

Key Value 
b d:e:o:p (your concatenated string)
h j:k (your concatenated string)

由于您已将分隔符设置为:，而输出文件的tab将具有预期的结果。

如何将Hadoop Reducer的最终输出写入文本文件？

1 个答案: