为什么Hadoop MapReduce仅启动1 Reducer

时间:2019-03-16 20:02:01

标签: java hadoop mapreduce hadoop2

我尝试在4台计算机的群集中使用MapReduce区分2GB数据(JSON类型,1条记录是1个JSON对象)。映射工作正常,有17个地图任务启动(因为我的块大小为128MB),但只有1个reduce任务启动。我没有在代码中设置reducer的数量。

映射器代码

public static class DistinctMapper extends
        Mapper<Object, Text, Text, Text> {

    private Text outFieldKey = new Text();
    private String field;

    @Override
    protected void setup(Context context) throws IOException,
            InterruptedException {
        this.field = context.getConfiguration().get("field");
    }

    @Override
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {

        JSONObject parsed = new JSONObject();
        String strValue = value.toString();
        parsed = new JSONObject(strValue);

        String selectedFieldVal = "";
        selectedFieldVal = parsed.get(this.field).toString();

        outFieldKey.set(selectedFieldVal);
        context.write(outFieldKey, value);
    }
}

组合代码

public static class DistinctCombiner extends
        Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Iterable<Text> values,
            Reducer.Context context) throws IOException, InterruptedException {
        context.write(key, values.iterator().next());
    }
}

减速器代码

 public static class DistinctReducer extends
        Reducer<Text, Text, Text, Text> {

    @Override
    public void reduce(Text key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {
        context.write(new Text(), values.iterator().next());
    }
}

主要

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args)
            .getRemainingArgs();
    if (otherArgs.length != 3) {
        System.err.println("Usage: DistinctDataBySomeField <field> <in> <out>");
        System.exit(2);
    }

    conf.set("field", otherArgs[0]);
    Job job = new Job(conf, "Distinct data by field x");
    job.setJarByClass(DistinctPatternDriver.class);
    job.setMapperClass(DistinctMapper.class);
    job.setCombinerClass(DistinctCombiner.class);
    job.setReducerClass(DistinctReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

我需要许多减速器,所以它可以更快。为什么会这样呢?

0 个答案:

没有答案