我的要求如下
input file
key value
eid ename
1 a
2 b
3 c
o / p file
key values
eid 1,2,3
ename a,b,c
我使用标题数组和数据数组在我的映射器中编写了逻辑 case1:没有Reducer(即setNumReduceTasks(0))
case2:使用默认Reducer
在这两种情况下,我只是将o / p作为
eid 1
eid 2
eid 3
ename a
ename b
ename c
答案 0 :(得分:1)
要实现这一点,您必须使用减速器。原因是,您希望eid
的所有记录都转到同一个reducer,而ename
的所有记录都转到同一个reducer。这有助于您汇总eid
和ename
。
如果你只使用mapper(没有reducer),那么不同的eid
可能会转到不同的映射器。
以下代码实现了这一目标:
package com.myorg.hadooptests;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.util.Iterator;
public class EidTest {
public static class EidTestMapper
extends Mapper<LongWritable, Text , Text, Text > {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split("\t");
if(words.length == 2) {
context.write(new Text("eid"), new Text(words[0]));
context.write(new Text("ename"), new Text(words[1]));
}
}
}
public static class EidTestReducer
extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String finalVal = "";
for (Text val : values) {
finalVal = finalVal.concat(val.toString()).concat(",");
}
finalVal = finalVal.substring(0, finalVal.length() - 1); // Remove trailing comma
context.write(key, new Text(finalVal));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "EidTest");
job.setJarByClass(WordCount.class);
job.setMapperClass(EidTestMapper.class);
job.setReducerClass(EidTestReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("/in/in9.txt"));
FileOutputFormat.setOutputPath(job, new Path("/out/"));
job.waitForCompletion(true);
}
}
对于您的输入,我得到了输出(映射器假定键/值是制表符分隔的):
eid 3,2,1
ename c,b,a