Question

我的输入文件包含：

id   value
1e   1
2e   1
...
2e   1
3e   1
4e   1

我想找到输入文件的总ID。所以在我的主要内容中，我已经声明了一个列表，这样当我读取输入文件时，我会将该行插入列表中

MainDriver.java public static Set list = new HashSet（）;

我的地图

// Apply regex to find the id
...

// Insert id to the list
MainDriver.list.add(regex.group(1));    // add 1e, 2e, 3e ...

并在我的reduce中，我尝试将列表用作

 public void reduce(WritableComparable key, Iterator values,
            OutputCollector output, Reporter reporter) throws IOException 
    {
        ...
        output.collect(key, new IntWritable(MainDriver.list.size()));
    }

所以我希望该值打印出文件，在这种情况下将是4.但它实际上打印出来。

我已经验证regex.group（1）会提取有效的id。所以我不知道为什么我的列表的大小在reduce过程中为0。

Answer 1

映射器和缩减器在相互之间和驱动程序上运行在单独的JVM（通常是完全独立的机器）上，因此没有list Set变量的通用实例，所有这些方法都可以同时执行读写。

MapReduce计算密钥数量的一种方法是：

从您的映射器中发出(id, 1)
（可选）使用组合器为每个映射器求1以最小化网络和减速器I / O
在减速机中：
- 在setup()中将类范围的数值变量（int或long presumbly）初始化为0
- 在reduce()中递增计数器，并忽略值
- 在cleanup()中，在已经处理完所有键的情况下发出计数器值
使用单个reducer运行作业，因此所有键都可以转到可以进行单次计数的同一JVM

Answer 2

这基本上忽略了首先使用MapReduce的优势。

如果我错了，请纠正我，但看起来您可以通过“id”映射Mapper的输出，然后在您的Reducer中收到类似Text key, Iterator values的参数。

然后，您可以总结values并输出output.collect(key, <total value>);

示例（使用Context而不是OutputCollector道歉，但逻辑是相同的）：

 public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

    private final Text key = new Text("id");
    private final Text id = new Text();

    public void map(LongWritable key, Text value,
                    Context context) throws IOException, InterruptedException {
         id.set(regex.group(1)); // do whatever you do
         context.write(id, countOne);
    }

}

public static class MyReducer extends Reducer<Text, Text, Text, IntWritable> {

    private final IntWritable totalCount = new IntWritable();

    public void reduce(Text key, Iterable<Text> values,
                       Context context) throws IOException, InterruptedException {

        int cnt = 0;
        for (Text value : values) {
            cnt ++;
        }

        totalCount.set(cnt);
        context.write(key, totalCount);
    }

}

hadoop - 输入文件的总行数

2 个答案: