Question

我有两个表格

的文件

文件1 ：

key1 value1

key2 value2

...

文件2 ：

key1 value3

key2 value4

...

我想生成一个

形式的reduce输出

key1（value1-value3）/ value1

key2（value2-value4）/ value2

我让地图写下了密钥，并且值前面加上一个告诉它的字符来自file1或file2，但不确定如何编写reduce阶段

我的地图方法是

public void map(LongWritable key,Text val,Context context) throws IOException,     InterruptedException
    {
        Text outputKey = new Text();
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("A")
        {               
            outputValue.set("A,"+val);
        }
        else
        {
            outputValue.set("B," + val);
        }
        context.write(outputKey,  outputValue);
    }
}

Answer 1

它应该很简单，因为你已经标记了它，虽然开始时有点混乱。我假设发出的值类似于A23（对于file1）＆amp; B139（对于file2）。片段：

public void reduce(Text key, Iterable<Text> values, Context context)
        throws IOException, InterruptedException {

    int diff = 0;
    int denominator = 1;
    for (Text val : values) {
        if (val.toString().startsWith("A")) {
            denominator = Integer.parseInt(val.toString().substring(1));
            diff += denominator;
        } else if (val.toString().startsWith("B")) {
            diff -= Integer.parseInt(val.toString().substring(1));
        } else {
            // This block shouldn't be reached unless malformed values are emitted
            // Throw an exception or log it
        }
    }
    diff /= denominator;
    context.write(key, new IntWritable(diff));
}

希望这会有所帮助。但我认为，当key1和key2相等时，您的方法会失败。

<强>更新
map应如下所示，以使用上面的reducer：

public void map(LongWritable key, Text val, Context context)
            throws IOException, InterruptedException {
        String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
        String[] keyVal = val.toString().split("\\s+");
        Text outputKey = new Text(keyVal[0]);
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("fileA".equals(fileName)) {
            outputValue.set("A" + keyVal[1]);
        } else {
            outputValue.set("B" + keyVal[1]);
        }
        context.write(outputKey, outputValue);
    }

Answer 2

我发现在这种情况下使用NamedVector非常有帮助。这提供了值的标识，以便您可以根据“名称”对值执行必需的操作。

在hadoop中用相同的键减去两个数字

2 个答案: