我试图在MapReduce中输出{key,list(values)}但我只得到有序{key,value}对

时间:2015-12-27 05:34:43

标签: hadoop mapreduce

我的要求如下

input file 
key    value
eid    ename
1      a
2      b
3      c

o / p file

key   values
eid   1,2,3
ename a,b,c

我使用标题数组和数据数组在我的映射器中编写了逻辑 case1:没有Reducer(即setNumReduceTasks(0))

case2:使用默认Reducer

在这两种情况下,我只是将o / p作为

eid   1
eid   2
eid   3
ename a
ename b
ename c

1 个答案:

答案 0 :(得分:1)

要实现这一点,您必须使用减速器。原因是,您希望eid的所有记录都转到同一个reducer,而ename的所有记录都转到同一个reducer。这有助于您汇总eidename

如果你只使用mapper(没有reducer),那么不同的eid可能会转到不同的映射器。

以下代码实现了这一目标:

package com.myorg.hadooptests;    

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.Iterator;

public class EidTest {

    public static class EidTestMapper
            extends Mapper<LongWritable, Text , Text, Text > {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {

            String line = value.toString();
            String[] words = line.split("\t");

            if(words.length == 2) {
                context.write(new Text("eid"), new Text(words[0]));
                context.write(new Text("ename"), new Text(words[1]));
            }
        }
    }

    public static class EidTestReducer
            extends Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {

            String finalVal = "";

            for (Text val : values) {
                finalVal = finalVal.concat(val.toString()).concat(",");
            }

            finalVal = finalVal.substring(0, finalVal.length() - 1); // Remove trailing comma
            context.write(key, new Text(finalVal));
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "EidTest");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(EidTestMapper.class);
        job.setReducerClass(EidTestReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path("/in/in9.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/out/"));

        job.waitForCompletion(true);
    }
}

对于您的输入,我得到了输出(映射器假定键/值是制表符分隔的):

eid     3,2,1
ename   c,b,a