我尝试在4台计算机的群集中使用MapReduce区分2GB数据(JSON类型,1条记录是1个JSON对象)。映射工作正常,有17个地图任务启动(因为我的块大小为128MB),但只有1个reduce任务启动。我没有在代码中设置reducer的数量。
映射器代码
public static class DistinctMapper extends
Mapper<Object, Text, Text, Text> {
private Text outFieldKey = new Text();
private String field;
@Override
protected void setup(Context context) throws IOException,
InterruptedException {
this.field = context.getConfiguration().get("field");
}
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
JSONObject parsed = new JSONObject();
String strValue = value.toString();
parsed = new JSONObject(strValue);
String selectedFieldVal = "";
selectedFieldVal = parsed.get(this.field).toString();
outFieldKey.set(selectedFieldVal);
context.write(outFieldKey, value);
}
}
组合代码
public static class DistinctCombiner extends
Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values,
Reducer.Context context) throws IOException, InterruptedException {
context.write(key, values.iterator().next());
}
}
减速器代码
public static class DistinctReducer extends
Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
context.write(new Text(), values.iterator().next());
}
}
主要
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: DistinctDataBySomeField <field> <in> <out>");
System.exit(2);
}
conf.set("field", otherArgs[0]);
Job job = new Job(conf, "Distinct data by field x");
job.setJarByClass(DistinctPatternDriver.class);
job.setMapperClass(DistinctMapper.class);
job.setCombinerClass(DistinctCombiner.class);
job.setReducerClass(DistinctReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
我需要许多减速器,所以它可以更快。为什么会这样呢?