在hadoop中用相同的键减去两个数字

时间:2014-11-03 02:54:08

标签: java hadoop mapreduce

我有两个表格

的文件

文件1

key1 value1

key2 value2

...

文件2

key1 value3

key2 value4

...

我想生成一个

形式的reduce输出

key1(value1-value3)/ value1

key2(value2-value4)/ value2

我让地图写下了密钥,并且值前面加上一个告诉它的字符 来自file1或file2,但不确定如何编写reduce阶段

我的地图方法是

public void map(LongWritable key,Text val,Context context) throws IOException,     InterruptedException
    {
        Text outputKey = new Text();
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("A")
        {               
            outputValue.set("A,"+val);
        }
        else
        {
            outputValue.set("B," + val);
        }
        context.write(outputKey,  outputValue);
    }
}

2 个答案:

答案 0 :(得分:1)

它应该很简单,因为你已经标记了它,虽然开始时有点混乱。我假设发出的值类似于A23(对于file1)& B139(对于file2)。片段:

public void reduce(Text key, Iterable<Text> values, Context context)
        throws IOException, InterruptedException {

    int diff = 0;
    int denominator = 1;
    for (Text val : values) {
        if (val.toString().startsWith("A")) {
            denominator = Integer.parseInt(val.toString().substring(1));
            diff += denominator;
        } else if (val.toString().startsWith("B")) {
            diff -= Integer.parseInt(val.toString().substring(1));
        } else {
            // This block shouldn't be reached unless malformed values are emitted
            // Throw an exception or log it
        }
    }
    diff /= denominator;
    context.write(key, new IntWritable(diff));
}

希望这会有所帮助。但我认为,当key1key2相等时,您的方法会失败。

<强>更新
map应如下所示,以使用上面的reducer:

public void map(LongWritable key, Text val, Context context)
            throws IOException, InterruptedException {
        String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
        String[] keyVal = val.toString().split("\\s+");
        Text outputKey = new Text(keyVal[0]);
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("fileA".equals(fileName)) {
            outputValue.set("A" + keyVal[1]);
        } else {
            outputValue.set("B" + keyVal[1]);
        }
        context.write(outputKey, outputValue);
    }

答案 1 :(得分:0)

我发现在这种情况下使用NamedVector非常有帮助。这提供了值的标识,以便您可以根据“名称”对值执行必需的操作。