Question

我需要使用map reduce来实现一个功能。

要求如下所述。

映射器的输入是一个包含两列productId，Salescount
减少者输出，salescount之和

要求是我需要计算salescount / sum（salescount）。

为此，我计划使用嵌套地图reduce。但对于第二个映射器，我需要使用第一个reducers输出和第一个map的输入。

我该如何实现呢？或者有其他方法吗？

此致 Vinu

Answer 1

您可以按照自己的方式使用ChainMapper和ChainReducer来映射PIPE Mappers和Reducers。请查看here

以下内容类似于您需要实现的代码段

JobConf mapBConf = new JobConf(false);

JobConf reduceConf = new JobConf(false);

ChainMapper.addMapper(conf, FirstMapper.class, FirstMapperInputKey.class, FirstMapperInputValue.class,
   FirstMapperOutputKey.class, FirstMapperOutputValue.class, false, mapBConf);

ChainReducer.setReducer(conf, FirstReducer.class, FirstMapperOutputKey.class, FirstMapperOutputValue.class,
   FirstReducerOutputKey.class, FirstReducerOutputValue.class, true, reduceConf);

ChainReducer.addMapper(conf, SecondMapper.class, FirstReducerOutputKey.class, FirstReducerOutputValue.class,
   SecondMapperOutputKey.class, SecondMapperOutputValue.class, false, null);

ChainReducer.setReducer(conf, SecondReducer.class, SecondMapperOutputKey.class, SecondMapperOutputValue.class, SecondReducerOutputKey.class, SecondReducerOutputValue.class, true, reduceConf);

或者如果您不想使用多个Mappers和Reducers，您可以执行以下操作

public static class ProductIndexerMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {

    private static Text productId = new Text();
    private static LongWritable salesCount = new LongWritable();

    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<Text, LongWritable> output, Reporter reporter)
            throws IOException {
        String[] values = value.toString().split("\t");
        productId.set(values[0]);           
        salesCount.set(Long.parseLong(values[1]));
        output.collect(productId, salesCount);
    }

}

public static class ProductIndexerReducer extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable> {

    private static LongWritable productWritable = new LongWritable();

    @Override
    public void reduce(Text key, Iterator<LongWritable> values,
            OutputCollector<Text, LongWritable> output, Reporter reporter)
            throws IOException {
        List<LongWritable> items = new ArrayList<LongWritable>(); 
        long total = 0;
        LongWritable item = null;
        while(values.hasNext()) {
            item = values.next();
            total += item.get();
            items.add(item);
        }
        Iterator<LongWritable> newValues = items.iterator();
        while(newValues.hasNext()) {
            productWritable.set(newValues.next().get()/total);
            output.collect(key, productWritable);
        }
    }

}

`

Answer 2

有了手头的用例，我相信我们不需要两个不同的映射器/ mapreduce工作来实现这一目标。（作为上述评论中给出的答案的延伸）

假设您在HDFS中将一个非常大的输入文件拆分为多个块。当您使用此文件作为输入触发MapReduce作业时，多个映射器（等于输入块的数量）将并行开始执行。

在mapper实现中，从输入中读取每一行并将productId写为键，将saleCount作为值写入上下文。此数据将传递给Reducer。

我们知道，在MR作业中，具有相同键的所有数据都将传递到同一个reducer。现在，在reducer实现中，您可以计算特定productId的所有saleCounts的总和。

注意：我不确定分子中的'salescount'值。

假设它是特定产品出现次数的计数，请使用计数器在计算SUM（saleCount）的相同for循环中添加并获取总销售次数。所以，我们有

totalCount - ＆gt;产品出现次数的计数 sumSaleCount - ＆gt;每种产品的saleCount值总和。

现在，您可以直接划分上述值：totalCount / sumSaleCount。

希望这有帮助！如果您有不同的用例，请告诉我。

Hadoop Map Reduce，如何组合第一个reducer输出和第一个map输入，作为第二个mapper的输入？

2 个答案: